Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)
-
Updated
Oct 3, 2025 - Python
Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)
Genshin Impact Character Instruction Models tuned by Lora on LLM
Advanced LLM fine-tuning techniques: SFT (LoRA, QLoRA, DoRA, P-/Prefix-Tuning), GRPO, DPO, ORPO, KTO & PPO; composable correctness/format rewards + LLM-as-a-Judge evals (DeepEval, Evidently AI) across math, multi-hop, medical & general QA on Llama 3, Mistral, Phi-4, Gemma & Qwen3. Built on TRL, PEFT & Unsloth.
Lightweight preference optimization for LLMs using LoRA and ORPO
Creating a GPT-2-Based Chatbot with Human Preferences
Preference optimization framework for text classification (DPO/ORPO/KTO), with SFT, encoder, and XGBoost baselines plus unified run pipeline and reproducible outputs.
This comprehensive technical guide, developed at the request of OnlyFans founder, demonstrates advanced AI model fine-tuning methodologies to transform Qwen2-72b into a Jessica Rabbit personality emulation using cutting-edge QLoRA and ORPO techniques.
Span-cited English investor memos from Japanese 有価証券報告書, produced by a 14B nekomata-qfin fine-tune on a single AMD Instinct MI300X.
End-to-end LLM preference learning pipeline: training, evaluation, and comparison of DPO, ORPO, KTO, and RLHF with 4-bit quantization, LoRA, and memory-efficient training on a single 8GB GPU.
Korean 3B LLM (pure Transformer) pretrained from scratch on 8× NVIDIA B200 GPUs with SFT + ORPO alignment
Add a description, image, and links to the orpo topic page so that developers can more easily learn about it.
To associate your repository with the orpo topic, visit your repo's landing page and select "manage topics."