Ship evals before you ship features.
-
Updated
Feb 25, 2026 - Nunjucks
Ship evals before you ship features.
Security working agreements for AI coding agents: hardened AGENTS.md, prompt/tool-injection guardrails, dependency hygiene, Scorecard-ready OSS setup
Agent-CE is a containerized continuous evaluation (CE) platform for web browsing agents. It provides production-ready Docker images and CI/CD pipelines for running and evaluating multiple agent frameworks including Browser Use, Notte, Anthropic Computer Use, and OpenAI Computer Use.
Protect macOS AI agents from identity theft with shell scripts that secure configs, keys, tokens, and memory against autonomous proxy attacks.
Define, measure, and enforce code correctness with Eval-Driven Development, ensuring every probabilistic system ships with automated proof of quality.
Add a description, image, and links to the continuous-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the continuous-evaluation topic, visit your repo's landing page and select "manage topics."