orpo

Star

Here are 17 public repositories matching this topic...

chtmp223 / suri

Star

Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)

nlp data text-generation instruction-tuning orpo

Updated Oct 3, 2025
Python

OctopusMind / ORPO

Star

ORPO算法实现

lora rlhf qwen orpo

Updated Jun 13, 2024
Python

svjack / Genshin-Impact-Character-Instruction

Star

Genshin Impact Character Instruction Models tuned by Lora on LLM

Updated Jun 3, 2024
Python

Advanced LLM fine-tuning techniques: SFT (LoRA, QLoRA, DoRA, P-/Prefix-Tuning), GRPO, DPO, ORPO, KTO & PPO; composable correctness/format rewards + LLM-as-a-Judge evals (DeepEval, Evidently AI) across math, multi-hop, medical & general QA on Llama 3, Mistral, Phi-4, Gemma & Qwen3. Built on TRL, PEFT & Unsloth.

lora fine-tuning ppo peft sft dpo kto p-tuning qlora orpo grpo

Updated May 9, 2026
Python

gustavecortal / preference-optimization-orpo-lora

Star

Lightweight preference optimization for LLMs using LoRA and ORPO

reinforcement-learning lora dpo huggingface model-averaging orpo

Updated Oct 9, 2025
Python

reshalfahsi / gpt2chat

Star

Creating a GPT-2-Based Chatbot with Human Preferences

natural-language-processing chatbot pytorch language-model gpt-2 huggingface pytorch-lightning langchain instruction-tuning preference-alignment orpo

Updated May 9, 2025
Jupyter Notebook

miharcan / lora-preference-optimization-comparison

Star

Preference optimization framework for text classification (DPO/ORPO/KTO), with SFT, encoder, and XGBoost baselines plus unified run pipeline and reproducible outputs.

nlp machine-learning text-classification xgboost lora peft dpo kto tranformers qlora llm-fine-tuning orpo preference-optimization

Updated Apr 4, 2026
Python

davidgeorgewilliams / JessicaRabbit-QLoRA-Axolotl

Star

This comprehensive technical guide, developed at the request of OnlyFans founder, demonstrates advanced AI model fine-tuning methodologies to transform Qwen2-72b into a Jessica Rabbit personality emulation using cutting-edge QLoRA and ORPO techniques.

reinforcement-learning ai reinforcement-learning-algorithms fine-tuning llm-training finetuning-llms orpo

Updated Jun 30, 2025

thibaud-perrin / llm-research-toolbox

Star

A curated list of repositories exploring various aspects of Large Language Model (LLM) development, including fine-tuning, dataset generation, multimodal models, and preference alignment.

vlm fine-tuning dpo llm rlhf orpo

Updated Jan 28, 2025

f3990 / unsloth-rlhf

Star

RLHF experiments with Unsloth: DPO, ORPO, SimPO, KTO. Training scripts, notebooks, and quick evaluation utilities.

lora dpo huggingface kto trl rlhf qlora unsloth orpo simpo

Updated Aug 18, 2025
Python

howardchiang2 / fine_tune_llama3_with_ORPO

Star

通过ORPO的方法微调llama3

llm llama3 orpo

Updated May 9, 2024
Jupyter Notebook

javierdejesusda / YuhoLens

Star

Span-cited English investor memos from Japanese 有価証券報告書, produced by a 14B nekomata-qfin fine-tune on a single AMD Instinct MI300X.

nlp amd multi-agent rocm fine-tuning structured-extraction huggingface edinet instinct llm qlora qwen gguf langgraph orpo mi300x japanese-finance yuho investor-memos

Updated May 13, 2026
Python

thibaud-perrin / preference-alignment

Star

Exploring innovative methods like DPO and ORPO for aligning language models with human preferences efficiently and effectively.

python ai fine-tuning dpo llm orpo

Updated Jan 23, 2025
Jupyter Notebook

fikreab-s / dpo-preference-alignment-lab

Star

Systematic comparison of DPO, IPO, KTO, and ORPO alignment methods

alignment ipo dpo kto rlhf orpo preference-optimization

Updated May 9, 2026
Python

fabiantoh98 / llm-preference-learning

Star

End-to-end LLM preference learning pipeline: training, evaluation, and comparison of DPO, ORPO, KTO, and RLHF with 4-bit quantization, LoRA, and memory-efficient training on a single 8GB GPU.

transformers pytorch lora fine-tuning dpo trl llm rlhf qlora orpo

Updated Feb 11, 2026
Python

pathcosmos / FRANKENSTALLM

Star

Korean 3B LLM (pure Transformer) pretrained from scratch on 8× NVIDIA B200 GPUs with SFT + ORPO alignment

transformer sft gqa pretraining fp8 korean-llm flash-attention gguf orpo nvidia-b200

Updated Mar 26, 2026
Python

t-timms / bible-ai-assistant

Star

Bible Q&A — Qwen3.5-4B fine-tuned with ORPO, hybrid RAG, constitutional AI guardrails, voice pipeline

python pytorch fine-tuning rag voice-ai trl llm chromadb unsloth orpo

Updated May 9, 2026
Python

Improve this page

Add a description, image, and links to the orpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the orpo topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orpo

Here are 17 public repositories matching this topic...

chtmp223 / suri

OctopusMind / ORPO

svjack / Genshin-Impact-Character-Instruction

avnlp / llm-finetuning

gustavecortal / preference-optimization-orpo-lora

reshalfahsi / gpt2chat

miharcan / lora-preference-optimization-comparison

davidgeorgewilliams / JessicaRabbit-QLoRA-Axolotl

thibaud-perrin / llm-research-toolbox

f3990 / unsloth-rlhf

howardchiang2 / fine_tune_llama3_with_ORPO

javierdejesusda / YuhoLens

thibaud-perrin / preference-alignment

fikreab-s / dpo-preference-alignment-lab

fabiantoh98 / llm-preference-learning

pathcosmos / FRANKENSTALLM

t-timms / bible-ai-assistant

Improve this page

Add this topic to your repo