Build, Evaluate, and Deploy GUI Agents — online RL training, standardized benchmarks, and real-device deployment in one framework.
-
Updated
Apr 16, 2026 - Python
Build, Evaluate, and Deploy GUI Agents — online RL training, standardized benchmarks, and real-device deployment in one framework.
A curated collection of the world’s most advanced benchmark datasets for evaluating Large Language Model (LLM) Agents.
🧠 Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.
Add a description, image, and links to the guiagents topic page so that developers can more easily learn about it.
To associate your repository with the guiagents topic, visit your repo's landing page and select "manage topics."