Concentrate agent autofix-boundary and report-reading guidance by pengfei-threemoonslab · Pull Request #55 · ThreeMoonsLab/agents-shipgate

pengfei-threemoonslab · 2026-05-08T22:35:31Z

Summary

Add four agent-facing docs that concentrate guidance previously scattered across AGENTS.md, agent-contract-current.md, autofix-policy.md, agent-recipes.md, and target-repo-agent-snippets.md
Cross-link sweep across seven existing files so any agent-facing entry point reaches the new pages in two hops or fewer

Type

Why

A new agent or CI integrator coming to Agents Shipgate had to read five files in the right order to learn (a) what an agent may safely do mechanically, (b) what conclusions an agent must defer to a human reviewer, and (c) how to walk report.json correctly. The guidance was complete but spread thin. This PR concentrates it into purpose-built pages and wires every agent-facing entry point through to them.

What's in each new doc

docs/agent-autofix-boundary.md — behavioral counterpart to the mechanical autofix-policy.md. Opens with the load-bearing distinction (autofix-policy.md answers "will apply-patches run this?"; this page answers "what may an agent assert in a PR comment?"). Includes a check-ID mapping table covering the seven categories (approval, confirmation, idempotency, broad-scope, prohibited-action, runtime trace evidence, plus an override-refusal script).
docs/report-reading-for-agents.md — reader's primer that walks report.json in order: release_decision.decision first, then supporting fields, then findings, then per-finding autofix flags, then packet human_in_the_loop. Anti-patterns include a concrete summary.status → release_decision.decision code rewrite. Full schema-version table.
docs/agents/use-with-codex.md — mirror of use-with-claude-code.md for OpenAI Codex. On-ramp is the canonical AGENTS.md snippet from target-repo-agent-snippets.md; no slash command or skill bundle to install.
docs/agents/use-with-cursor.md — mirror for Cursor. On-ramp is the auto-attach .cursor/rules/agents-shipgate.mdc rule; explains the globs: + alwaysApply: false mechanism.

Cross-link sweep

docs/INDEX.md — "For agents" lists all four new docs plus the previously-missing use-with-claude-code.md and agent-contract-current.md.
AGENTS.md — Task 2 points at the report primer; "What you can't do" leads with a CLI-vs-agent boundary clarification; the Claude Code paragraph at line 415 is now an "Editor / agent integrations" subsection covering all three editors.
docs/agent-contract-current.md — new "See also" section beneath "Authoritative references" (kept separate so schemas stay in the authoritative list and reader's-guides land in see-also).
docs/autofix-policy.md — "See also" leads with the boundary doc as the behavioral counterpart.
docs/agent-recipes.md — "Reference" list expanded.
prompts/README.md and docs/agents/use-with-claude-code.md — Codex/Cursor punts replaced with real links; Aider stays paste-only.

Vocabulary discipline

The canonical six-item phrase ("approval, confirmation, idempotency, broad-scope, or prohibited-action policy decisions") that already lived in target-repo-agent-snippets.md is now used verbatim in the four new docs, with runtime trace evidence added as the seventh category per the brief — flipping a trace patches the evidence record, not the runtime gate.

Verification

CI is authoritative for python -m ruff check ., python -m compileall -q src tests, and python -m pytest.

Additional local checks run:

Link integrity: extracted every relative link from the four new docs (grep -oE '\]$[^)]+$') and confirmed every target file/anchor exists. All ten check-ID anchors (#ship-policy-approval-missing etc.) match ### SHIP-... headers in docs/checks.md.
Vocabulary parity: the canonical six-item phrase appears verbatim in the four new docs plus the existing target-repo-agent-snippets.md; "runtime trace evidence" is called out as the seventh.
Discoverability: release_decision.decision is now mentioned in 13 files across docs/, AGENTS.md, and prompts/README.md. Sampled three entry points (AGENTS.md Task 2, docs/INDEX.md "For agents", prompts/README.md) — each reaches both new top-level docs in one hop.
No regressions: git diff on the seven modified existing files is additive only (new lines, new bullets, new "See also" entries) plus the targeted "Editor / agent integrations" subsection split in AGENTS.md.

Release-readiness notes

No user-code import added to default scan paths
No network access added to default scan paths
New or changed check IDs are documented in docs/checks.md (no new check IDs; existing IDs are referenced)
Report/schema changes are additive or documented in STABILITY.md (no schema changes)

🤖 Generated with Claude Code

Boundary and report-reading guidance was scattered across AGENTS.md, agent-contract-current.md, autofix-policy.md, agent-recipes.md, and target-repo-agent-snippets.md, so a new agent integrator had to hop between files to learn what is mechanically safe vs. what requires human review and how to walk report.json. Concentrate that guidance into purpose-built pages and cross-link every agent-facing entry point so the right reading order is hard to miss. Add four docs: agent-autofix-boundary.md (behavioral counterpart to the mechanical autofix-policy.md, with a check-ID mapping table and override-refusal script), report-reading-for-agents.md (reader's primer that walks report.json starting from release_decision.decision, with concrete summary.status -> release_decision.decision rewrite), agents/use-with-codex.md, and agents/use-with-cursor.md (mirrors of use-with-claude-code.md for editors that lack a slash-command/skill bundle; on-ramps are the existing AGENTS.md and .cursor/rules snippets in target-repo-agent-snippets.md). Cross-link sweep: docs/INDEX.md "For agents" lists all four new docs plus the previously-missing use-with-claude-code.md and agent-contract-current.md; AGENTS.md Task 2 points at the report primer; AGENTS.md "What you can't do" leads with a CLI-vs-agent boundary clarification; AGENTS.md gains an "Editor / agent integrations" subsection covering all three editors; agent-contract-current.md gets a See-also section; autofix-policy.md See-also leads with the boundary doc as the behavioral counterpart; agent-recipes.md References gains both new docs; prompts/README.md and use-with-claude-code.md replace their Codex/ Cursor punts with real links. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three corrections to the new agent docs in PR #55: - agent-autofix-boundary.md: SHIP-EVIDENCE-APPROVAL-TRACE-MISSING was miscategorized under "Approval policy." It fires when an approval-required tool is missing local HITL trace evidence — the boundary is about trace artifacts, not about the policy declaration. Move it to the runtime trace evidence row and broaden that row's handoff text to cover "trace evidence missing OR trace shows policy-controlled call without approval/confirmation." - agent-autofix-boundary.md: prohibited-action row promised check IDs but listed only the manifest field. Add SHIP-SCOPE-PROHIBITED-TOOL- PRESENT as the canonical check ID an agent walking findings[] will actually see, with handoff text matching the check description. - use-with-cursor.md: Cursor's auto-attach rules require a matching file in chat context (alwaysApply: false). The verify step said "in a fresh chat" which would not trigger the rule. Make the second step explicitly require the matching file be open or @-referenced, and add a sentence explaining alwaysApply: false is the intended behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Codex integration guide claimed Codex has no slash-command or skill mechanism. That is stale: Codex Skills ship in the CLI, IDE extension, and app, with bundles installable at .codex/skills/ (project-scoped) or ~/.codex/skills/ (user-scoped), invoked via /skill-name or implicitly when Codex decides. Reframe the integration story: this repo does not currently ship a Codex skill bundle (the parallel to skills/agents-shipgate/ for Claude Code has not been authored), so the AGENTS.md snippet is the minimal on-ramp that works today — not an inherent Codex limitation. Add a "What's next" section documenting the SKILL.md path (.codex/skills/agents-shipgate/SKILL.md) and the building blocks (prompts/ recipes, advisory CI workflow, agent-autofix-boundary.md) for assembling one locally before this repo ships an official one. Update AGENTS.md "Editor / agent integrations" entry to match. Source: https://developers.openai.com/codex/skills Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rageDecision fields Two factual errors in PR #55: - Codex skill paths/invocation. The doc said .codex/skills/ and ~/.codex/skills/ with /skill-name invocation. Per current OpenAI docs, Codex scans .agents/skills/ in every directory from the working directory up to the repo root, plus $HOME/.agents/skills/ for user-scoped skills, and invocation is /skills or $<skill-name> (or implicit when Codex matches the task). Following the wrong paths would put SKILL.md in a directory Codex never scans. Fix all four occurrences (use-with-codex.md intro, surface table, What's next install location, AGENTS.md Editor integrations entry). - EvidenceCoverageDecision schema. report-reading-for-agents.md documented release_decision.evidence_coverage.{level, human_review_recommended, warnings} but the v0.10 schema has no warnings field. The actual fields are level, human_review_recommended, low_confidence_tool_count, and source_warning_count (docs/report-schema.v0.10.json:275-302). Replace warnings with the two real count fields. Source: https://developers.openai.com/codex/skills Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Picks up: contract CLI command (#52), v0.10.0 release tag (#53), init --agent-instructions (#54), HITL evidence provenance (#50), agent autofix-boundary docs (#55), packet schema v0.3. Conflict resolution: kept v0.11 report-schema references on top of main's v0.10.0 release / packet-schema v0.3 / contract-command additions. AGENTS.md and SKILL.md adopt main's centralized "contract lives in agent-contract-current.md" pattern; the v0.11 provenance line lives there now. test_public_surface_contract.py adopts main's derive-from-model approach for the current schema constants and just adds v0.10 to the legacy-pattern list. Also fixes a SARIF regression flagged in review: ``_location()`` chose the structured branch whenever ``source.path`` was set, so a finding with ``path="foo.py"`` and legacy ``location="foo.py:10"`` emitted no ``region``. Hybrid / plugin findings now fall back to ``_split_location(source.location or source.ref)`` when ``start_line`` is absent. Adds a regression test. After merge: 805 passed (+3 skipped), ruff clean, ``agents-shipgate contract --json`` reports ``report_schema_version: "0.11"``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI on PR #56 caught the stale llms-full.txt: the merge of origin/main brought in three commits (#54 init --agent-instructions, #55 autofix- boundary docs, #51 v0.11 schema with source provenance) that touched AGENTS.md and the report-schema literals, so the committed llms-full.txt no longer matched what `scripts/build-llms-full.py` produces from the post-merge sources. Re-ran the build script and committed the regenerated file. Now: - 859 tests pass + 3 skipped (was 748 on this branch; +111 from main's new tests for source provenance, agent-instructions renderers, managed blocks, etc.) - ruff clean - llms-full.txt is byte-identical to `render(REPO_ROOT)`, so test_llms_full_is_up_to_date passes No code changes — purely the rebuilt artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@tool

* Add trigger catalog, zero-install detector, and adoption benchmark Shipping the first two weeks of the agent-adoption strategy in one PR. The unifying goal is to make Agents Shipgate self-discoverable to coding agents (Claude Code, Codex, Cursor) that haven't already adopted it: give them machine-readable trigger rules, a zero-install path to verify relevance, and a public benchmark that measures whether adoption-improving edits are working. Trigger surface: - docs/triggers.json — machine-readable mirror of the AGENTS.md trigger table. Every rule carries an `agents_md_row` field that the contract test pins against the AGENTS.md prose, so the two cannot drift. Stable for 0.x. - src/agents_shipgate/triggers.py — canonical loader and evaluator with predicate vocabulary documented in triggers.json itself. `python -m agents_shipgate.triggers shipgate.yaml prompts/refund.md` returns a run/skip verdict plus matched rules. - prompts/decide-shipgate-relevance.md — relevance-decision prompt that walks an agent through fetching triggers.json and applying it to a PR diff before any other Shipgate prompt fires. - AGENTS.md, llms.txt, .well-known/agents-shipgate.json, pyproject.toml, README — cross-link the new surface so every entry surface points at every other. Long-form reference: - llms-full.txt — concatenated AGENTS.md + recipes + contract + checks + concepts + autofix-policy in one document for AI search engines and coding agents that prefer one fetch. - scripts/build-llms-full.py — deterministic generator; the contract test fails if a source file changes without regenerating. Zero-install path: - tools/shipgate-detect.py — stdlib-only Python detector that replicates the structural verdict of `agents-shipgate detect --json` without requiring a local install. Pinned to the canonical CLI by tests/test_zero_install_detector.py across all 8 sample fixtures (same is_agent_project, same fired frameworks, same suggested sources). - docs/zero-install.md — three zero-install paths (single-file detector, uvx, GitHub Action) with a decision matrix. - docs/quickstart.md now leads with the zero-install detector before the install section. Benchmark scaffolding: - benchmark/ — frozen archetypes, four prompts (none mention Shipgate by name), five setup variants, tester-facing runbook, results CSV schema, and an upstream-PR tracker. The headline metric is the delta between `00-no-hints` and `10-agents-md` on the discovery rubric. Manual W2 baseline run is the next step. Drift guards: - tests/test_public_surface_contract.py extends the existing drift suite with checks for triggers.json/AGENTS.md row parity, llms-full.txt freshness via the build script's render(), and a parametrized "every prompt is mirrored to skills/" assertion. - tests/test_zero_install_detector.py adds 24 parity tests pinning the zero-install script to the canonical CLI on every sample. Verification: 733 pytest passes (709 W1 baseline + 24 new); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix trigger evaluator precedence and decorator-rule reachability Addresses three review findings on the trigger catalog landed in the parent commit: P1 (decorator rules unreachable from the prompt): prompts/decide-shipgate-relevance.md piped only `git diff --name-only` into the evaluator, so `diff_contains` rules (TRIGGER-FUNCTION-TOOL- DECORATOR, TRIGGER-FRAMEWORK-VERSION-BUMP, TRIGGER-SHIPGATE-CI-WORKFLOW Action match) silently never fired — agents following the prompt would skip a PR that only adds `@function_tool`. Add `--git-diff [REVSPEC]` to `python -m agents_shipgate.triggers`, which shells out to `git diff --name-only [REVSPEC]` AND `git diff [REVSPEC]` to populate paths and diff body in one call. Update the prompt's Option B to use it. P2 (run_shipgate silently overrode skip_shipgate): A README-only diff that incidentally mentioned `@tool` (or quoted the Action URL) returned `run_shipgate: true` because the evaluator treated `has_run` as winning over `has_skip`, making the docs-only negative rule effectively dead. Reorder the precedence: stop_conditions → force_run → skip → run → dry_run. `skip_shipgate` now beats `run_shipgate`. To preserve the "manifest present means always run" semantic, promote TRIGGER-EXISTING-MANIFEST-PRESENT to a new action `force_run` that overrides skip — an opted-in repo's docs-only PR still scans because the cost is low and tool-adjacent prose can matter. P3 (dry_run rules silently dead): When only TRIGGER-FRAMEWORK-VERSION-BUMP fired, the evaluator reported "No rules matched" — the prompt translated that to "do not propose Shipgate", making the dry_run rule non-actionable despite being in the catalog. Add a `dry_run_recommended` field to the evaluator output. When only dry_run rules match, `run_shipgate` stays false but the field is true and the rationale names the matched rules. The prompt now routes this state to "propose a non-mutating scan; do not propose init --write". triggers.json gains an `actions` block describing each action's semantics and an `action_precedence` array documenting the high-to-low order. Both are reference material for an agent reading the catalog directly. Tests: - New: skip beats run on docs-only with @tool in prose - New: force_run beats skip when manifest present - New: dry_run sets dry_run_recommended; rule appears in matched_rules - New: pin TRIGGER-EXISTING-MANIFEST-PRESENT.action == "force_run" - Updated: _VALID_TRIGGER_ACTIONS includes "force_run" Verification: 737 pytest passes (was 733; +4 new); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Cover tests in docs-only skip; fix bare --git-diff; update prompt verification Three more review findings from PR #56: P2a (tests-only diff with @tool slips through): AGENTS.md says "Pure read-only doc/test changes with no manifest impact" should skip, but TRIGGER-DOCS-ONLY-NEGATIVE only matched `**/*.md`. A tests-only diff that incidentally mentions `@function_tool` (in a fixture or assertion) returned `run_shipgate: true` because the broad decorator rule fired and no skip rule counterbalanced it. Extended the `every_file_matches` predicate to accept either a string (existing form, back-compat) or a list (any-of within the predicate). Updated the rule's pattern list to include `tests/**`, `test/**`, `**/tests/**`, `**/test/**`, `**/test_*.py`, `**/*_test.py`, `**/conftest.py`. Mixed docs+tests PRs now skip; code+tests mixes still trigger normally. P2b (bare --git-diff misses staged and untracked): The prompt advertised bare `--git-diff` for "uncommitted changes" but the implementation ran plain `git diff`, which only shows unstaged changes. A staged `@function_tool` addition silently returned no matched rules. Bare flag now runs `git diff HEAD` for both paths and content (covering BOTH staged and unstaged tracked changes), then appends untracked file *paths* via `git ls-files --others --exclude-standard`. Untracked file *content* is not captured (reading arbitrary unstaged files into memory is risky); the prompt documents the limitation explicitly. P3 (prompt verification contradicts the dry_run path): prompts/decide-shipgate-relevance.md's verification checklist named the output keys as `run_shipgate, matched_rules, rationale` (no `dry_run_recommended`) and asserted "no Shipgate command appears" whenever `run_shipgate: false`. That contradicts the dry_run path added in the previous commit, which explicitly proposes a non-mutating scan when `dry_run_recommended: true`. Updated the verification checklist to: - List all four canonical output keys including `dry_run_recommended` - Allow exactly one Shipgate command (a non-mutating scan) when dry_run_recommended is true and run_shipgate is false - Forbid Shipgate commands only when both are false Added two NOT-to-do bullets: never propose `init --write` on a dry_run-only match; bare --git-diff doesn't surface untracked file content. Mirrored to the skill copy. Tests added (10 cases): - Parametrized: 6 tests-only path patterns with @function_tool in diff all return run_shipgate=false - Code+test mix with @function_tool returns run_shipgate=true (negative case for the every_file_matches expansion) - _eval_predicate accepts every_file_matches as both string and list - _git_diff_context with no revspec captures staged-only changes - _git_diff_context with no revspec surfaces untracked file paths (and confirms untracked content is NOT in diff_text) Verification: 747 pytest passes (was 737; +10 new); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Clarify zero-install detector is a structural subset, not drop-in P3 review: docs/zero-install.md and llms.txt implied the zero-install script has the same JSON shape as `agents-shipgate detect --json`. It doesn't — the canonical CLI emits `diagnostics[]` and `next_actions[]` arrays (the diagnostic engine), which are intentionally out of scope for the stdlib-only zero-install path. The script emits a structural subset of `DetectResult` plus `script_version`. - llms.txt: "same JSON shape" → "same structural verdict … emits the canonical `DetectResult` fields plus `script_version`, but NOT the CLI's `diagnostics` or `next_actions` arrays." - docs/zero-install.md: rephrased "Output mirrors `agents-shipgate detect --json` (plus a `script_version` field)" to "structural subset … not a drop-in replacement." Closing line now reads "structural verdict parity" and explicitly notes "field-by-field byte parity is not pinned and not promised." - tools/shipgate-detect.py docstring: lists `diagnostics[]` and `next_actions[]` in the "Intentional simplifications" section alongside the existing items (no git fast path, descriptive evidence strings, ±0.5 score variance). Test: pin the absence of `diagnostics` and `next_actions` keys in the script output so a future change that adds them is forced to update the wording surfaces in the same PR. Verification: 748 pytest passes (was 747; +1 new); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Regenerate llms-full.txt after merging main (v0.11 schema bump) CI on PR #56 caught the stale llms-full.txt: the merge of origin/main brought in three commits (#54 init --agent-instructions, #55 autofix- boundary docs, #51 v0.11 schema with source provenance) that touched AGENTS.md and the report-schema literals, so the committed llms-full.txt no longer matched what `scripts/build-llms-full.py` produces from the post-merge sources. Re-ran the build script and committed the regenerated file. Now: - 859 tests pass + 3 skipped (was 748 on this branch; +111 from main's new tests for source provenance, agent-instructions renderers, managed blocks, etc.) - ruff clean - llms-full.txt is byte-identical to `render(REPO_ROOT)`, so test_llms_full_is_up_to_date passes

pengfei-threemoonslab and others added 4 commits May 8, 2026 15:34

pengfei-threemoonslab merged commit e00ddc5 into main May 8, 2026
1 check passed

pengfei-threemoonslab deleted the claude/sweet-mcclintock-c3ba52 branch May 8, 2026 23:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concentrate agent autofix-boundary and report-reading guidance#55

Concentrate agent autofix-boundary and report-reading guidance#55
pengfei-threemoonslab merged 4 commits intomainfrom
claude/sweet-mcclintock-c3ba52

pengfei-threemoonslab commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengfei-threemoonslab commented May 8, 2026

Summary

Type

Why

What's in each new doc

Cross-link sweep

Vocabulary discipline

Verification

Release-readiness notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant