Skip to content

This project provides an overview and implementation of the work described in "HELMify: A Hybrid Rule- and LLM-Based Generator of Peptide Monomer HELM Names".

License

Notifications You must be signed in to change notification settings

MSDLLCpapers/helmify

Repository files navigation

HELMify

This project provides an overview and implementation of the work described in "HELMify: A Hybrid Rule- and LLM-Based Generator of Peptide Monomer HELM Names".

This repository will enable a user to recreate the Large Language Model (LLM) based naming method, given they have all the prerequired materials. If a user has an established implementation of the zone-based naming method, they should be able to connect it to this repository, thus enabling them to run the zone-based and hybrid naming methodologies in addition to the LLM-based naming.

Methodology Overview

Method Description (Pre)Requirements
LLM-Based LLM-assisted full monomer name generation Full monomer database, Azure OpenAI Configuration variables, OpenEye license server
Zone-Based Monomer name suggestion provided by zone-based structural decomposition Zone-based naming URL
Hybrid Monomer name generated through combination of zone-based and LLM method, using zone-based namer first and then LLM to help generate unknown substituents Substituent dictionary, Azure OpenAI Configuration variables, OpenEye license server

Get Started

Environment Setup

# Create conda environment
conda env create -f environment.yaml
conda activate helmify

Required Environment Variables

Create a .env file within the /helmify directory, and fill the variables in with your own values. Reference the table above for the minimum necessary requirements for each naming method. These values are referenced in config.py

# Required for Azure Openai implementation

OPENAI_PROVIDER="azure"
OPENAI_API_KEY="your_azure_openai_api_key"
OPENAI_API_ROOT="https://your_resource.openai.azure.com"
OPENAI_API_VERSION="api_version"
OPENAI_MODEL="model_version"

# Database configuration required for nearest neighbor search

# A CSV file with columns: 'complete_smiles','name', and 'symbol'. An example can be found in /sample_database
MONOMER_DATABASE="path_to_monomer_database.csv"

# A CSV file with columns: 'smiles', 'name', 'symbol'. An example can be found in /sample_database
SUBSTITUENT_DICTIONARY="path_to_substituent_dictionary.csv"
 
# OpenEye License Server - Required for SMILES to IUPAC name conversion and additional cheminformatics functionalities. 
OE_LICENSE_SERVER="path_to_openeye_license_server"

# Zone based naming implementation

# This is a service endpoint that provides zone-based naming functionality. The implementation of this method is described in the work: HELMify: A Hybrid Rule- and LLM-Based Generator of Peptide Monomer HELM Names. The output of this method is a string representation with various parse-able outputs as described in zone_module.py
ZONE_URL="https://zone_based_method_endpoint"

Run API

Run the API using uvicorn HTTP server. Execute the following command through the terminal, while you are in the /helmify directory:

uvicorn main:api --env-file .env

Access the API through a web-browser. Copy the address specified on the last line of the terminal, after execution of the uvicorn command(you can try http://127.0.0.1:8000/helm-api-root), or you can use any API testing tool (e.g. Python requests, curl, or Postman)

📓 Example Usage

See the HELMify Demo Notebook for a comprehensive tutorial on how to call the API programmatically.

About

This project provides an overview and implementation of the work described in "HELMify: A Hybrid Rule- and LLM-Based Generator of Peptide Monomer HELM Names".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published