Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
-
Updated
Apr 20, 2026 - Python
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Run more RL experiments. Wait less for GPUs.
[CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
Claw-R1: Empowering OpenClaw with Advanced Agentic RL.
[ACL 2026 Findings] Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
DART-GUI: Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
Curated, opinionated index of post-R1 LLM × Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers — GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.
Proximity-based Multi-turn Optimization (ProxMO) - Official Implementation
SGLang model provider for Strands Agents for on-policy agentic RL training.
[ACL2026] AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement Learning Framework for Stock Trading.
Standardizing environment infrastructure with Strands Agents — step, observe, reward.
This is the official repository for our paper "Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning" published in ICRL 2026.
Official implementation for paper "Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe"
Official Code of Paper: MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization
Train and customize OpenClaw agents using reinforcement learning with simple language feedback and fully asynchronous optimization.
Add a description, image, and links to the agentic-rl topic page so that developers can more easily learn about it.
To associate your repository with the agentic-rl topic, visit your repo's landing page and select "manage topics."