agent-discipline

The audit your coding agent runs on itself.

Syntactic linters check whether your CLAUDE.md parses. This checks whether it makes your agent behave — whether your config actually forces the practices that produce good code: stop runaway loops, read the source before assuming, never declare "done" without a green build, never hardcode a secret.

It runs as a skill inside Claude Code (/agent-discipline:audit-setup), points the model at your own config, and hands back a 0–100 score, per-criterion evidence, and the exact diff to raise it.

Score: 64/100 — 🟠 Workable AD-1 Anti-loop rule — 0/9 — no explicit ceiling on repeated attempts found. AD-4 No "done" without proof — 4/8 — present (line 22) but no test/build gate named. EN-1 Destructive guardrail — 3/5 — deny on rm -rf & force-push; missing cloud-delete. → Apply the diffs below to reach 🟢 87/100.

Why this exists

There are good linters for agent configs already — agnix, cclint, ctxlint, AgentLint. They're deterministic binaries, and they're excellent at deterministic things: stale model IDs, broken paths, malformed frontmatter, exposed secrets. Run them. This does not replace them.

But a deterministic binary has no model, so it can only check syntax. It cannot answer the question that actually predicts whether your agent ships good work:

Does this config enforce a real engineering discipline — or is it a wish list?

Judging that is a semantic call. "Does an explicit anti-loop rule exist, and is it an actionable protocol or an empty slogan?" needs a language model to read and reason. Running as a skill inside the coding agent gives you that model for free — the agent audits its own operating manual. That's the whole idea, and it's why a syntactic linter can't follow here without becoming an LLM app.

The standard is the product. The skill is ~50 lines that point a model at STANDARD.md. The value is the opinionated, battle-tested rubric: twelve criteria across failure discipline, verification, security, delegation, and self-improvement — each with a presence test, a "what good looks like", and red flags. See STANDARD.md.

What it scores

Category	Weight	Checks (excerpt)
Failure discipline	28	anti-loop ceiling · verify-first · reuse-before-new · scope discipline
Verification discipline	22	no "done" without proof · build-before-ship · consumer awareness · reversibility
Security baseline	20	no hardcoded secrets · env docs · dependency vetting · secret hygiene in output
Delegation & autonomy	9	declared autonomy tiers · critical-area gating
Improvement loop	6	error → durable prevention
Enforcement (mechanical)	15	guardrails imposed by hooks / permission deny-lists, not just prose

The last row is the dimension no syntactic linter measures and no prose can fake: a rule the agent can't ignore (deny: Bash(rm -rf *)) outranks one it merely should follow. It's scored by inspecting settings.json, so it's fully deterministic.

Scores are built to be stable run-to-run: the presence half of every behavioral criterion and all of enforcement are near-deterministic; only the quality half is model judgment. A reported score shouldn't swing more than a few points across reruns.

Install

As a Claude Code plugin (recommended — runs as a slash command):

/plugin marketplace add SpinaBuilds/agent-discipline
/plugin install agent-discipline
/agent-discipline:audit-setup

As a standalone skill (no marketplace):

cp -r skills/audit-setup ~/.claude/skills/agent-discipline-audit
cp STANDARD.md ~/.claude/skills/agent-discipline-audit/   # the skill needs the rubric next to it
# then in Claude Code:  /audit-setup

For other agents (Cursor, Codex, Copilot, Gemini CLI…): the standard is agent-agnostic. Paste STANDARD.md into the agent and ask it to audit your AGENTS.md / .cursorrules against it. (Claude Code does not load AGENTS.md natively, so the reference ships as both CLAUDE.md and AGENTS.md variants.)

Use it as a target, not just a test

templates/CLAUDE.reference.md — a config that scores in the 🟢 band. Start here, or diff your own against it.
examples/before-after/ — a weak config → audit report → diff → high-scoring config, end to end.

What it deliberately does not do

It does not re-implement syntactic checks — stale model IDs, broken paths, secret detection. Those are solved; use agnix et al. for them. Mixing the two would make this a worse linter and a worse auditor. This owns strategy; they own syntax.

Going further

The standard is open, and a clean-room config can reach the top band by following it. But consistently keeping a real codebase at 🟢 — the judgment calls, the enforcement tuned to your stack, the orchestration around it — is craft, not a checklist. That's what I do as a consultant — Antonio Spina / SpinaBuilds. Get in touch.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude-plugin		.claude-plugin
.github		.github
examples/before-after		examples/before-after
skills/audit-setup		skills/audit-setup
templates		templates
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
STANDARD.md		STANDARD.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-discipline

Why this exists

What it scores

Install

Use it as a target, not just a test

What it deliberately does not do

Going further

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

agent-discipline

Why this exists

What it scores

Install

Use it as a target, not just a test

What it deliberately does not do

Going further

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages