agentic-evals

Here are 2 public repositories matching this topic...

itseffi / agenticevals

AI agent evaluation framework for full trajectories: tasks, actions, observations, verified final state, rewards, baselines, and RL-ready exports.

ai-agents evals agent-evaluation agentic-evals

Updated May 12, 2026
Python

jang1563 / biosafety-replayops-mini-track

Star

Public-safe synthetic agentic bio-safeguard eval. 26 cases / 52 fixtures / 9 deterministic hard gates / Replay Ledger export. Not a capability benchmark.

evaluation replay ai-safety synthetic-data safeguards prompt-injection biosafety agentic-evals

Updated May 16, 2026
Python

Improve this page

Add a description, image, and links to the agentic-evals topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the agentic-evals topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly