CodonRL

Installation

Prerequisites

OS: Linux (tested on Ubuntu 20.04+)
GPU: NVIDIA GPU with CUDA 12.1+ support (recommended: H100, A100 for training)
Conda: Anaconda or Miniconda

Setup

# Clone the repository
git clone git@github.com:Kingsford-Group/codonrl.git

cd codonrl

# Create conda environment
conda env create -f environment.yml

# Activate environment
conda activate codonrl

Checkpoints

We provide 55 model checkpoints trained on different protein sequences from the UniProt dataset.

Download All Checkpoints

Download all 55 checkpoints using the provided script:

# Download all checkpoints
bash download_checkpoint.sh

Quick Start

After downloading checkpoints, run inference with pre-trained models:

# Simple: test 5 alpha values (1.0, 1.5, 2.0, 2.5, 3.0)
bash run_decoding_multiobjective.sh

# Advanced: multi-objective optimization (e.g., U content minimization)
bash run_decoding_multiobjective_multialpha.sh

Inference & Benchmarking

Configuration

Before running, edit paths in the scripts:

OUT_DIR=./benchmark_multialpha
CSV_PATH=./datasets/gemorna_with_all_metrics.csv
CKPT_ROOT=./checkpoints

Important: Update the path in visualizeandbenchmark_multialpha.py and visualizeandbenchmark.py:

sys.path.append('/path/to/codonrl')  # Change to  actual CodonRL path

1. Simple Multi-Objective Benchmarking

Script: run_decoding_multiobjective.sh

Tests 5 different alpha values with balanced evaluation weights.

bash run_decoding_multiobjective.sh

Monitor progress:

tail -f benchmark_multialpha/logs/*.log

Outputs (for each alpha, e.g., alpha=2.5):

*.csv - Detailed metrics (CAI, MFE, CSC, GC, U)
*_rna.fasta / *_dna.fasta - Generated mRNA sequences
*_viz.png - Parity plots
summary.txt - Quick statistics

2. Advanced Multi-Alpha Benchmarking

Script: run_decoding_multiobjective_multialpha.sh

Fine-grained control over multiple optimization objectives.

Current experiment: U content minimization

bash run_decoding_multiobjective_multialpha.sh

Key parameters:

--alpha_cai : CAI weight
--alpha_csc : Codon stability coefficient
--alpha_gc : GC content weight
--alpha_u : U content weight (negative = minimize)
--target_gc / --target_u: Target content values

Custom experiments:

# CAI optimization
python visualizeandbenchmark_multialpha.py \
  --alpha_cai 2.5 --run_name "high_cai" ...

# GC content targeting (55%)
python visualizeandbenchmark_multialpha.py \
  --alpha_gc 0.5 --target_gc 0.55 --run_name "gc_target" ...

# Multi-objective balance
python visualizeandbenchmark_multialpha.py \
  --alpha_cai 1.0 --alpha_csc 0.3 --alpha_gc 0.2 --alpha_u -0.3 \
  --run_name "balanced" ...

Training (For Advanced Users)

Want to train on your own data? Follow this guide.

Basic Usage

# Batch training from JSON file
python CodonRL_main.py --jf datasets/proteins.json

Key Parameters

Parameter	Default	Description
`--jf` / `--json_input_file`	Required	Training dataset (JSON format)
`--codon_table`	`human`	Codon table: `human` or `ecolik12`
`--lambda_val`	`4.0`	CAI-MFE tradeoff (0=MFE only, 10=CAI priority)
`--protein_max_len`	`700`	Maximum protein length to process
`--batch_size`	`64`	Training batch size
`-e` / `--num_episodes`	`250`	Total training episodes
`--learning_rate`	`5e-5`	Learning rate
`--buffer_size`	`10000`	Experience replay buffer size
`--target_update_freq`	`50`	Target network update frequency (steps)
`--max_workers`	`None`	Max parallel workers (auto-detected if None)
`--mfe_workers`	`4`	Thread-pool size per process for MFE calculations
`--milestone_mfe_method`	`linearfold`	MFE method during training: `linearfold` or `vienna`
`--final_mfe_method`	`vienna`	MFE method for final evaluation
`--output_dir`	`results`	Directory for checkpoints and logs

GPU Configuration

Single GPU (50 workers):

export DEVICES=$(python3 -c "print(','.join(['cuda:0']*50))")

Multi-GPU (e.g., 4 GPUs with 12-13 workers each):

export DEVICES=$(python3 -c "
devices = []
for i in range(4):
    devices.extend([f'cuda:{i}']*13)
print(','.join(devices[:50]))
")

Manual configuration:

export DEVICES="cuda:0,cuda:0,cuda:1,cuda:1,cuda:2,cuda:2,cuda:3,cuda:3"

Optional Flags

--use_amp                    # Enable automatic mixed precision (recommended for modern GPUs)
--prepopulate_buffer         # Pre-fill replay buffer before training starts
--wandb_log                  # Enable Weights & Biases logging
--wandb_project <name>       # W&B project name
--wandb_run_name_prefix <p>  # W&B run name prefix for experiment tracking

Production Example

DEVICES=$(python3 -c "print(','.join(['cuda:0']*50))")

nohup python CodonRL_main.py \
  --jf ./datasets/uniprot_le_500/uniprot_with_guidance_l0.json \
  --codon_table human \
  --lambda_val 4 \
  --protein_max_len 501 \
  --batch_size 64 \
  -e 500 \
  --buffer_size 100000 \
  --learning_rate 2e-5 \
  --target_update_freq 150 \
  --parallel_devices $DEVICES \
  --max_workers 55 \
  --mfe_workers 4 \
  --milestone_mfe_method linearfold \
  --final_mfe_method linearfold \
  --use_amp \
  --prepopulate_buffer \
  --wandb_log \
  --wandb_project CodonRL \
  --wandb_run_name_prefix run \
  --output_dir results

Citation

@article{du2026codonrl, title={CodonRL: Multi-Objective Codon Sequence Optimization Using Demonstration-Guided Reinforcement Learning}, author={Du, Shiyi and Kaynar, G{"u}n and Li, Jiayi and You, Zhaoyi and Tang, Shijie and Kingsford, Carl}, journal={bioRxiv}, pages={2026--02}, year={2026}, publisher={Cold Spring Harbor Laboratory} }

License

This project is licensed under the CodonRL Software License Agreement.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
__pycache__		__pycache__
config		config
datasets		datasets
model_soup_checkpoint		model_soup_checkpoint
CodonRL_main.py		CodonRL_main.py
README.md		README.md
build_soup55_checkpoint.py		build_soup55_checkpoint.py
download_checkpoint.sh		download_checkpoint.sh
environment.yml		environment.yml
license.txt		license.txt
long_sequence_inference.py		long_sequence_inference.py
run_decoding.sh		run_decoding.sh
run_decoding_multiobjective_multialpha.sh		run_decoding_multiobjective_multialpha.sh
run_soup55_rolling.sh		run_soup55_rolling.sh
soup55_rolling_inference.py		soup55_rolling_inference.py
training_example.sh		training_example.sh
visualizeandbenchmark.py		visualizeandbenchmark.py
visualizeandbenchmark_multialpha.py		visualizeandbenchmark_multialpha.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodonRL

Installation

Prerequisites

Setup

Checkpoints

Download All Checkpoints

Quick Start

Inference & Benchmarking

Configuration

1. Simple Multi-Objective Benchmarking

2. Advanced Multi-Alpha Benchmarking

Training (For Advanced Users)

Basic Usage

Key Parameters

GPU Configuration

Optional Flags

Production Example

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodonRL

Installation

Prerequisites

Setup

Checkpoints

Download All Checkpoints

Quick Start

Inference & Benchmarking

Configuration

1. Simple Multi-Objective Benchmarking

2. Advanced Multi-Alpha Benchmarking

Training (For Advanced Users)

Basic Usage

Key Parameters

GPU Configuration

Optional Flags

Production Example

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages