The audit your coding agent runs on itself.
Syntactic linters check whether your CLAUDE.md parses. This checks whether it makes your agent behave — whether your config actually forces the practices that produce good code: stop runaway loops, read the source before assuming, never declare "done" without a green build, never hardcode a secret.
It runs as a skill inside Claude Code (/agent-discipline:audit-setup), points the model at your own config, and hands back a 0–100 score, per-criterion evidence, and the exact diff to raise it.
Score: 64/100 — 🟠 Workable
AD-1Anti-loop rule — 0/9 — no explicit ceiling on repeated attempts found.AD-4No "done" without proof — 4/8 — present (line 22) but no test/build gate named.EN-1Destructive guardrail — 3/5 — deny onrm -rf& force-push; missing cloud-delete. → Apply the diffs below to reach 🟢 87/100.
There are good linters for agent configs already — agnix, cclint, ctxlint, AgentLint. They're deterministic binaries, and they're excellent at deterministic things: stale model IDs, broken paths, malformed frontmatter, exposed secrets. Run them. This does not replace them.
But a deterministic binary has no model, so it can only check syntax. It cannot answer the question that actually predicts whether your agent ships good work:
Does this config enforce a real engineering discipline — or is it a wish list?
Judging that is a semantic call. "Does an explicit anti-loop rule exist, and is it an actionable protocol or an empty slogan?" needs a language model to read and reason. Running as a skill inside the coding agent gives you that model for free — the agent audits its own operating manual. That's the whole idea, and it's why a syntactic linter can't follow here without becoming an LLM app.
The standard is the product. The skill is ~50 lines that point a model at STANDARD.md. The value is the opinionated, battle-tested rubric: twelve criteria across failure discipline, verification, security, delegation, and self-improvement — each with a presence test, a "what good looks like", and red flags. See STANDARD.md.
| Category | Weight | Checks (excerpt) |
|---|---|---|
| Failure discipline | 28 | anti-loop ceiling · verify-first · reuse-before-new · scope discipline |
| Verification discipline | 22 | no "done" without proof · build-before-ship · consumer awareness · reversibility |
| Security baseline | 20 | no hardcoded secrets · env docs · dependency vetting · secret hygiene in output |
| Delegation & autonomy | 9 | declared autonomy tiers · critical-area gating |
| Improvement loop | 6 | error → durable prevention |
| Enforcement (mechanical) | 15 | guardrails imposed by hooks / permission deny-lists, not just prose |
The last row is the dimension no syntactic linter measures and no prose can fake: a rule the agent can't ignore (deny: Bash(rm -rf *)) outranks one it merely should follow. It's scored by inspecting settings.json, so it's fully deterministic.
Scores are built to be stable run-to-run: the presence half of every behavioral criterion and all of enforcement are near-deterministic; only the quality half is model judgment. A reported score shouldn't swing more than a few points across reruns.
As a Claude Code plugin (recommended — runs as a slash command):
/plugin marketplace add SpinaBuilds/agent-discipline
/plugin install agent-discipline
/agent-discipline:audit-setup
As a standalone skill (no marketplace):
cp -r skills/audit-setup ~/.claude/skills/agent-discipline-audit
cp STANDARD.md ~/.claude/skills/agent-discipline-audit/ # the skill needs the rubric next to it
# then in Claude Code: /audit-setupFor other agents (Cursor, Codex, Copilot, Gemini CLI…): the standard is agent-agnostic. Paste STANDARD.md into the agent and ask it to audit your AGENTS.md / .cursorrules against it. (Claude Code does not load AGENTS.md natively, so the reference ships as both CLAUDE.md and AGENTS.md variants.)
templates/CLAUDE.reference.md— a config that scores in the 🟢 band. Start here, or diff your own against it.examples/before-after/— a weak config → audit report → diff → high-scoring config, end to end.
It does not re-implement syntactic checks — stale model IDs, broken paths, secret detection. Those are solved; use agnix et al. for them. Mixing the two would make this a worse linter and a worse auditor. This owns strategy; they own syntax.
The standard is open, and a clean-room config can reach the top band by following it. But consistently keeping a real codebase at 🟢 — the judgment calls, the enforcement tuned to your stack, the orchestration around it — is craft, not a checklist. That's what I do as a consultant — Antonio Spina / SpinaBuilds. Get in touch.
MIT.