Skip to content

SpinaBuilds/agent-discipline

agent-discipline

License: MIT Claude Code plugin PRs welcome

The audit your coding agent runs on itself.

Syntactic linters check whether your CLAUDE.md parses. This checks whether it makes your agent behave — whether your config actually forces the practices that produce good code: stop runaway loops, read the source before assuming, never declare "done" without a green build, never hardcode a secret.

It runs as a skill inside Claude Code (/agent-discipline:audit-setup), points the model at your own config, and hands back a 0–100 score, per-criterion evidence, and the exact diff to raise it.

Score: 64/100 — 🟠 Workable AD-1 Anti-loop rule — 0/9 — no explicit ceiling on repeated attempts found. AD-4 No "done" without proof — 4/8 — present (line 22) but no test/build gate named. EN-1 Destructive guardrail — 3/5 — deny on rm -rf & force-push; missing cloud-delete. → Apply the diffs below to reach 🟢 87/100.


Why this exists

There are good linters for agent configs already — agnix, cclint, ctxlint, AgentLint. They're deterministic binaries, and they're excellent at deterministic things: stale model IDs, broken paths, malformed frontmatter, exposed secrets. Run them. This does not replace them.

But a deterministic binary has no model, so it can only check syntax. It cannot answer the question that actually predicts whether your agent ships good work:

Does this config enforce a real engineering discipline — or is it a wish list?

Judging that is a semantic call. "Does an explicit anti-loop rule exist, and is it an actionable protocol or an empty slogan?" needs a language model to read and reason. Running as a skill inside the coding agent gives you that model for free — the agent audits its own operating manual. That's the whole idea, and it's why a syntactic linter can't follow here without becoming an LLM app.

The standard is the product. The skill is ~50 lines that point a model at STANDARD.md. The value is the opinionated, battle-tested rubric: twelve criteria across failure discipline, verification, security, delegation, and self-improvement — each with a presence test, a "what good looks like", and red flags. See STANDARD.md.

What it scores

Category Weight Checks (excerpt)
Failure discipline 28 anti-loop ceiling · verify-first · reuse-before-new · scope discipline
Verification discipline 22 no "done" without proof · build-before-ship · consumer awareness · reversibility
Security baseline 20 no hardcoded secrets · env docs · dependency vetting · secret hygiene in output
Delegation & autonomy 9 declared autonomy tiers · critical-area gating
Improvement loop 6 error → durable prevention
Enforcement (mechanical) 15 guardrails imposed by hooks / permission deny-lists, not just prose

The last row is the dimension no syntactic linter measures and no prose can fake: a rule the agent can't ignore (deny: Bash(rm -rf *)) outranks one it merely should follow. It's scored by inspecting settings.json, so it's fully deterministic.

Scores are built to be stable run-to-run: the presence half of every behavioral criterion and all of enforcement are near-deterministic; only the quality half is model judgment. A reported score shouldn't swing more than a few points across reruns.

Install

As a Claude Code plugin (recommended — runs as a slash command):

/plugin marketplace add SpinaBuilds/agent-discipline
/plugin install agent-discipline
/agent-discipline:audit-setup

As a standalone skill (no marketplace):

cp -r skills/audit-setup ~/.claude/skills/agent-discipline-audit
cp STANDARD.md ~/.claude/skills/agent-discipline-audit/   # the skill needs the rubric next to it
# then in Claude Code:  /audit-setup

For other agents (Cursor, Codex, Copilot, Gemini CLI…): the standard is agent-agnostic. Paste STANDARD.md into the agent and ask it to audit your AGENTS.md / .cursorrules against it. (Claude Code does not load AGENTS.md natively, so the reference ships as both CLAUDE.md and AGENTS.md variants.)

Use it as a target, not just a test

What it deliberately does not do

It does not re-implement syntactic checks — stale model IDs, broken paths, secret detection. Those are solved; use agnix et al. for them. Mixing the two would make this a worse linter and a worse auditor. This owns strategy; they own syntax.

Going further

The standard is open, and a clean-room config can reach the top band by following it. But consistently keeping a real codebase at 🟢 — the judgment calls, the enforcement tuned to your stack, the orchestration around it — is craft, not a checklist. That's what I do as a consultant — Antonio Spina / SpinaBuilds. Get in touch.

License

MIT.

About

An opinionated, scored standard for the operating discipline of a coding agent's config — the audit your agent runs on itself.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors