Standards for building agents, better
-
Updated
Feb 22, 2026 - TypeScript
Standards for building agents, better
Agentic testing for agentic codebases
Ship agents you can audit.
The pre-flight check for AI agents
Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks
GitHub template for agent-testable SaaS apps. Next.js 16 + shadcn/ui + Neon Postgres + agent-browser e2e testing via accessibility tree.
Agent testing automation 🤖 by simulating users 👥 and agents 🤝 with judge ⚖️(langwatch-scenario)
𝘈 𝘔𝘶𝘭𝘵𝘪-𝘈𝘨𝘦𝘯𝘵 𝘚𝘺𝘴𝘵𝘦𝘮 𝘧𝘰𝘳 𝘊𝘳𝘰𝘴𝘴-𝘊𝘩𝘦𝘤𝘬𝘪𝘯𝘨 𝘗𝘩𝘪𝘴𝘩𝘪𝘯𝘨 𝘜𝘙𝘓𝘴.
Real performance testing for CI/CD pipelines, staging environments, and load balancer validation
AI Agent Evaluation and Monitoring Guide
Behavior test framework for AI agents. Define tests in YAML. Run against transcripts. Get scored reports.
Holdout scenario evaluation harness for AI agents. Doer/Judge/Adversary/Observer roles, probabilistic satisfaction scoring, append-only JSONL audit trails with integrity hashes. Created Dec 2025.
Regression and evaluation toolkit for prompt and agent output quality
Demonstration of testing and evaluation patterns for AI agents using Azure AI evaluation tools with custom evaluators
Open-source agent simulation and runtime control platform for Claude Code
PHP testing framework for LLM agents — multi-turn dialogs, cassette replay, tool calling, LLM-as-judge assertions
Eval-driven Customer Support FTE using OpenAI Agents SDK. Multi-agent routing, guardrails, and systematic quality evaluation.
🧮 Solve mathematical problems and write proofs in natural language using this easy-to-use reasoning harness. Enhance your problem-solving skills effortlessly.
Add a description, image, and links to the agent-testing topic page so that developers can more easily learn about it.
To associate your repository with the agent-testing topic, visit your repo's landing page and select "manage topics."