Skip to content

Add init --agent-instructions for managed agent-facing snippets#54

Merged
pengfei-threemoonslab merged 3 commits intomainfrom
claude/kind-dhawan-f23d83
May 8, 2026
Merged

Add init --agent-instructions for managed agent-facing snippets#54
pengfei-threemoonslab merged 3 commits intomainfrom
claude/kind-dhawan-f23d83

Conversation

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor

Summary

  • Turns the snippet copy in docs/target-repo-agent-snippets.md into a first-class CLI output. agents-shipgate init --agent-instructions=all (or a agents-md,claude-md,cursor,pr-template subset; or none) renders the snippets to stdout; combined with --write, the CLI plants them in the target repo via managed <!-- agents-shipgate:start v=1 --> markers.
  • Idempotent and byte-preserving: re-running with no changes is a no-op, host newline style (LF/CRLF) is preserved, surrounding bytes are byte-equal across runs. Exit 2 when a target is in a state we will not overwrite (hand-edited cursor file, ambiguous markers, newer block version, directory-form PR template) — including a structured next_action JSON line on stderr under AGENTS_SHIPGATE_AGENT_MODE=1.
  • Composes orthogonally with --write and --ci (third action, exit code = max(manifest_exit, agent_instructions_exit)); JSON output gains a top-level agent_instructions key. Strict CI and baselines remain opt-in human decisions — a snapshot test asserts ci_mode: strict does not appear in any rendered target outside the shared CI-pointer paragraph.

Why

Adopters previously had to read docs/target-repo-agent-snippets.md and paste copy into their repos by hand. This makes the existing copy executable: one CLI flag plants the right guidance into AGENTS.md, CLAUDE.md, the Cursor rule, and the PR template — so any agent or reviewer working in the target repo discovers Shipgate without leaving the repo.

Notable design points

  • Selector requires an explicit value. --agent-instructions alone would fail Typer's argument parsing; =all / =<csv> / =none is unambiguous and explicit in scripts.
  • Managed-block markers, silent overwrite of hand edits inside the block. Block carries a footer pointing back at the refresh command; v=N token is the renderer-format version (not the package version) so the cursor PRIOR_RENDER_SHA256 list stays small.
  • Renderer content lifted from the existing doc (no redesign). Cursor globs include OpenAPI/Swagger/MCP/tools/Python so the rule activates while editing real tool surfaces. PR template uses the conditional "If this PR changes…" wording so docs-only PRs aren't false positives.
  • PR template path resolution probes both .github/pull_request_template.md and .github/PULL_REQUEST_TEMPLATE.md; refuses ambiguous-both-without-marker; explicitly out-of-scope-skips when the directory form .github/PULL_REQUEST_TEMPLATE/ exists (status enum reserves skipped_directory_template so future support is non-breaking).

Test plan

  • python -m pytest — 602 passed, 3 skipped (case-sensitivity-only PR-template tests skip on macOS APFS; Linux CI exercises them).
  • python -m ruff check . — clean.
  • End-to-end smoke against samples/simple_langchain_agent: init --write --agent-instructions=all --json creates all 4 files; re-run is byte-equal (git status --porcelain empty).
  • Triple combo init --write --ci --agent-instructions=all succeeds; JSON has manifest_status, workflow, agent_instructions keys.
  • Hand-edited cursor MDC under AGENTS_SHIPGATE_AGENT_MODE=1 exits 2 with {\"error\": \"config_already_exists\", \"next_action\": ...} on stderr; on-disk file untouched.
  • Reviewer: confirm the renderer copy in src/agents_shipgate/cli/discovery/agent_instructions/renderers/ matches your intent (especially the ## CI mini-section that adds the advisory pointer to AGENTS.md / CLAUDE.md / cursor — the doc didn't carry that paragraph inline).
  • Reviewer: confirm exit-2-on-skip is the right default, vs. exit 0 (workflow precedent). Manifest precedent is exit 2; this PR follows the manifest model.

🤖 Generated with Claude Code

pengfei-threemoonslab and others added 3 commits May 8, 2026 15:23
Turns the snippet copy in docs/target-repo-agent-snippets.md into a CLI
output: agents-shipgate init --agent-instructions=all|<csv>|none renders
or writes AGENTS.md, CLAUDE.md, .cursor/rules/agents-shipgate.mdc, and
.github/pull_request_template.md via managed `<!-- agents-shipgate:start
-->` markers. Idempotent (safe to rerun, byte-equal on no-op), advisory
only — no strict CI or baseline scaffolding. Composes with --write and
--ci as a third orthogonal action; per-target status enum + structured
agent-mode error on skip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review on #54:

- Refuse symlinks at managed-block and cursor target paths (new
  `skipped_symlink` status). `apply.py` no longer calls `.resolve()` on
  the joined target path, so a symlinked AGENTS.md cannot route writes
  outside the workspace.
- Collapse PR template casings via `Path.samefile()` so case-insensitive
  filesystems (macOS APFS, Windows NTFS) no longer report
  `skipped_ambiguous` when only one file actually exists on disk.
- Make `init --write --agent-instructions=...` idempotent at the process
  level: when the flag is set and shipgate.yaml already exists, the
  manifest skip is treated as informational (exit 0, no agent-mode error
  emit). Plain `init --write` still exits 2 for backwards compatibility.
- Pin `COLUMNS=200` for the help-text test so Rich does not truncate
  `--agent-instructions` to `--agent-in…` on narrow CI terminals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review on #54 (head 05f5696):

- Parent-directory symlink escape: walk every existing component between
  the workspace root and the target file. ``.github -> /tmp/outside``
  no longer routes ``pull_request_template.md`` writes outside the
  workspace; same protection covers ``.cursor`` and any deeper
  intermediate. New helper ``_first_symlink_in_chain`` is used by both
  managed-block and cursor target paths.
- Help-text test was flaking on GitHub CI (Rich truncates option names
  on narrow terminals even with ``COLUMNS=200``). Replace the rendered-
  string assertion with Click param introspection: confirm
  ``agent_instructions`` is a registered option on ``init`` and that its
  help text mentions ``advisory``. No terminal rendering involved.
- ``--agent-instructions=none`` parses to an empty target list (no
  instruction action runs), so the manifest-skip idempotency
  accommodation should not apply. Restore exit 2 when shipgate.yaml
  already exists in this case — matches plain ``init --write``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pengfei-threemoonslab pengfei-threemoonslab merged commit d184e98 into main May 8, 2026
1 check passed
@pengfei-threemoonslab pengfei-threemoonslab deleted the claude/kind-dhawan-f23d83 branch May 8, 2026 23:29
pengfei-threemoonslab added a commit that referenced this pull request May 8, 2026
Picks up: contract CLI command (#52), v0.10.0 release tag (#53),
init --agent-instructions (#54), HITL evidence provenance (#50),
agent autofix-boundary docs (#55), packet schema v0.3.

Conflict resolution: kept v0.11 report-schema references on top of
main's v0.10.0 release / packet-schema v0.3 / contract-command
additions. AGENTS.md and SKILL.md adopt main's centralized
"contract lives in agent-contract-current.md" pattern; the v0.11
provenance line lives there now. test_public_surface_contract.py
adopts main's derive-from-model approach for the current schema
constants and just adds v0.10 to the legacy-pattern list.

Also fixes a SARIF regression flagged in review: ``_location()``
chose the structured branch whenever ``source.path`` was set, so a
finding with ``path="foo.py"`` and legacy
``location="foo.py:10"`` emitted no ``region``. Hybrid / plugin
findings now fall back to ``_split_location(source.location or
source.ref)`` when ``start_line`` is absent. Adds a regression
test.

After merge: 805 passed (+3 skipped), ruff clean,
``agents-shipgate contract --json`` reports
``report_schema_version: "0.11"``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pengfei-threemoonslab added a commit that referenced this pull request May 9, 2026
CI on PR #56 caught the stale llms-full.txt: the merge of origin/main
brought in three commits (#54 init --agent-instructions, #55 autofix-
boundary docs, #51 v0.11 schema with source provenance) that touched
AGENTS.md and the report-schema literals, so the committed llms-full.txt
no longer matched what `scripts/build-llms-full.py` produces from the
post-merge sources.

Re-ran the build script and committed the regenerated file. Now:

- 859 tests pass + 3 skipped (was 748 on this branch; +111 from main's
  new tests for source provenance, agent-instructions renderers,
  managed blocks, etc.)
- ruff clean
- llms-full.txt is byte-identical to `render(REPO_ROOT)`, so
  test_llms_full_is_up_to_date passes

No code changes — purely the rebuilt artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pengfei-threemoonslab added a commit that referenced this pull request May 9, 2026
* Add trigger catalog, zero-install detector, and adoption benchmark

Shipping the first two weeks of the agent-adoption strategy in one
PR. The unifying goal is to make Agents Shipgate self-discoverable to
coding agents (Claude Code, Codex, Cursor) that haven't already
adopted it: give them machine-readable trigger rules, a zero-install
path to verify relevance, and a public benchmark that measures whether
adoption-improving edits are working.

Trigger surface:

- docs/triggers.json — machine-readable mirror of the AGENTS.md
  trigger table. Every rule carries an `agents_md_row` field that the
  contract test pins against the AGENTS.md prose, so the two cannot
  drift. Stable for 0.x.
- src/agents_shipgate/triggers.py — canonical loader and evaluator
  with predicate vocabulary documented in triggers.json itself.
  `python -m agents_shipgate.triggers shipgate.yaml prompts/refund.md`
  returns a run/skip verdict plus matched rules.
- prompts/decide-shipgate-relevance.md — relevance-decision prompt
  that walks an agent through fetching triggers.json and applying it
  to a PR diff before any other Shipgate prompt fires.
- AGENTS.md, llms.txt, .well-known/agents-shipgate.json, pyproject.toml,
  README — cross-link the new surface so every entry surface points
  at every other.

Long-form reference:

- llms-full.txt — concatenated AGENTS.md + recipes + contract +
  checks + concepts + autofix-policy in one document for AI search
  engines and coding agents that prefer one fetch.
- scripts/build-llms-full.py — deterministic generator; the
  contract test fails if a source file changes without regenerating.

Zero-install path:

- tools/shipgate-detect.py — stdlib-only Python detector that
  replicates the structural verdict of `agents-shipgate detect --json`
  without requiring a local install. Pinned to the canonical CLI by
  tests/test_zero_install_detector.py across all 8 sample fixtures
  (same is_agent_project, same fired frameworks, same suggested
  sources).
- docs/zero-install.md — three zero-install paths (single-file
  detector, uvx, GitHub Action) with a decision matrix.
- docs/quickstart.md now leads with the zero-install detector before
  the install section.

Benchmark scaffolding:

- benchmark/ — frozen archetypes, four prompts (none mention
  Shipgate by name), five setup variants, tester-facing runbook,
  results CSV schema, and an upstream-PR tracker. The headline
  metric is the delta between `00-no-hints` and `10-agents-md` on
  the discovery rubric. Manual W2 baseline run is the next step.

Drift guards:

- tests/test_public_surface_contract.py extends the existing
  drift suite with checks for triggers.json/AGENTS.md row parity,
  llms-full.txt freshness via the build script's render(), and a
  parametrized "every prompt is mirrored to skills/" assertion.
- tests/test_zero_install_detector.py adds 24 parity tests
  pinning the zero-install script to the canonical CLI on every
  sample.

Verification: 733 pytest passes (709 W1 baseline + 24 new); ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix trigger evaluator precedence and decorator-rule reachability

Addresses three review findings on the trigger catalog landed in the
parent commit:

P1 (decorator rules unreachable from the prompt):
  prompts/decide-shipgate-relevance.md piped only `git diff --name-only`
  into the evaluator, so `diff_contains` rules (TRIGGER-FUNCTION-TOOL-
  DECORATOR, TRIGGER-FRAMEWORK-VERSION-BUMP, TRIGGER-SHIPGATE-CI-WORKFLOW
  Action match) silently never fired — agents following the prompt
  would skip a PR that only adds `@function_tool`.

  Add `--git-diff [REVSPEC]` to `python -m agents_shipgate.triggers`,
  which shells out to `git diff --name-only [REVSPEC]` AND `git diff
  [REVSPEC]` to populate paths and diff body in one call. Update the
  prompt's Option B to use it.

P2 (run_shipgate silently overrode skip_shipgate):
  A README-only diff that incidentally mentioned `@tool` (or quoted
  the Action URL) returned `run_shipgate: true` because the evaluator
  treated `has_run` as winning over `has_skip`, making the docs-only
  negative rule effectively dead.

  Reorder the precedence: stop_conditions → force_run → skip → run →
  dry_run. `skip_shipgate` now beats `run_shipgate`. To preserve the
  "manifest present means always run" semantic, promote
  TRIGGER-EXISTING-MANIFEST-PRESENT to a new action `force_run` that
  overrides skip — an opted-in repo's docs-only PR still scans because
  the cost is low and tool-adjacent prose can matter.

P3 (dry_run rules silently dead):
  When only TRIGGER-FRAMEWORK-VERSION-BUMP fired, the evaluator
  reported "No rules matched" — the prompt translated that to "do not
  propose Shipgate", making the dry_run rule non-actionable despite
  being in the catalog.

  Add a `dry_run_recommended` field to the evaluator output. When only
  dry_run rules match, `run_shipgate` stays false but the field is
  true and the rationale names the matched rules. The prompt now
  routes this state to "propose a non-mutating scan; do not propose
  init --write".

triggers.json gains an `actions` block describing each action's
semantics and an `action_precedence` array documenting the
high-to-low order. Both are reference material for an agent reading
the catalog directly.

Tests:
- New: skip beats run on docs-only with @tool in prose
- New: force_run beats skip when manifest present
- New: dry_run sets dry_run_recommended; rule appears in matched_rules
- New: pin TRIGGER-EXISTING-MANIFEST-PRESENT.action == "force_run"
- Updated: _VALID_TRIGGER_ACTIONS includes "force_run"

Verification: 737 pytest passes (was 733; +4 new); ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Cover tests in docs-only skip; fix bare --git-diff; update prompt verification

Three more review findings from PR #56:

P2a (tests-only diff with @tool slips through):
  AGENTS.md says "Pure read-only doc/test changes with no manifest
  impact" should skip, but TRIGGER-DOCS-ONLY-NEGATIVE only matched
  `**/*.md`. A tests-only diff that incidentally mentions
  `@function_tool` (in a fixture or assertion) returned
  `run_shipgate: true` because the broad decorator rule fired and no
  skip rule counterbalanced it.

  Extended the `every_file_matches` predicate to accept either a
  string (existing form, back-compat) or a list (any-of within the
  predicate). Updated the rule's pattern list to include `tests/**`,
  `test/**`, `**/tests/**`, `**/test/**`, `**/test_*.py`,
  `**/*_test.py`, `**/conftest.py`. Mixed docs+tests PRs now skip;
  code+tests mixes still trigger normally.

P2b (bare --git-diff misses staged and untracked):
  The prompt advertised bare `--git-diff` for "uncommitted changes"
  but the implementation ran plain `git diff`, which only shows
  unstaged changes. A staged `@function_tool` addition silently
  returned no matched rules.

  Bare flag now runs `git diff HEAD` for both paths and content
  (covering BOTH staged and unstaged tracked changes), then appends
  untracked file *paths* via `git ls-files --others --exclude-standard`.
  Untracked file *content* is not captured (reading arbitrary
  unstaged files into memory is risky); the prompt documents the
  limitation explicitly.

P3 (prompt verification contradicts the dry_run path):
  prompts/decide-shipgate-relevance.md's verification checklist named
  the output keys as `run_shipgate, matched_rules, rationale` (no
  `dry_run_recommended`) and asserted "no Shipgate command appears"
  whenever `run_shipgate: false`. That contradicts the dry_run path
  added in the previous commit, which explicitly proposes a
  non-mutating scan when `dry_run_recommended: true`.

  Updated the verification checklist to:
  - List all four canonical output keys including `dry_run_recommended`
  - Allow exactly one Shipgate command (a non-mutating scan) when
    dry_run_recommended is true and run_shipgate is false
  - Forbid Shipgate commands only when both are false
  Added two NOT-to-do bullets: never propose `init --write` on a
  dry_run-only match; bare --git-diff doesn't surface untracked file
  content. Mirrored to the skill copy.

Tests added (10 cases):
- Parametrized: 6 tests-only path patterns with @function_tool in
  diff all return run_shipgate=false
- Code+test mix with @function_tool returns run_shipgate=true
  (negative case for the every_file_matches expansion)
- _eval_predicate accepts every_file_matches as both string and list
- _git_diff_context with no revspec captures staged-only changes
- _git_diff_context with no revspec surfaces untracked file paths
  (and confirms untracked content is NOT in diff_text)

Verification: 747 pytest passes (was 737; +10 new); ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Clarify zero-install detector is a structural subset, not drop-in

P3 review: docs/zero-install.md and llms.txt implied the zero-install
script has the same JSON shape as `agents-shipgate detect --json`. It
doesn't — the canonical CLI emits `diagnostics[]` and `next_actions[]`
arrays (the diagnostic engine), which are intentionally out of scope
for the stdlib-only zero-install path. The script emits a structural
subset of `DetectResult` plus `script_version`.

- llms.txt: "same JSON shape" → "same structural verdict … emits the
  canonical `DetectResult` fields plus `script_version`, but NOT the
  CLI's `diagnostics` or `next_actions` arrays."
- docs/zero-install.md: rephrased "Output mirrors `agents-shipgate
  detect --json` (plus a `script_version` field)" to "structural
  subset … not a drop-in replacement." Closing line now reads
  "structural verdict parity" and explicitly notes "field-by-field
  byte parity is not pinned and not promised."
- tools/shipgate-detect.py docstring: lists `diagnostics[]` and
  `next_actions[]` in the "Intentional simplifications" section
  alongside the existing items (no git fast path, descriptive
  evidence strings, ±0.5 score variance).

Test: pin the absence of `diagnostics` and `next_actions` keys in the
script output so a future change that adds them is forced to update
the wording surfaces in the same PR.

Verification: 748 pytest passes (was 747; +1 new); ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Regenerate llms-full.txt after merging main (v0.11 schema bump)

CI on PR #56 caught the stale llms-full.txt: the merge of origin/main
brought in three commits (#54 init --agent-instructions, #55 autofix-
boundary docs, #51 v0.11 schema with source provenance) that touched
AGENTS.md and the report-schema literals, so the committed llms-full.txt
no longer matched what `scripts/build-llms-full.py` produces from the
post-merge sources.

Re-ran the build script and committed the regenerated file. Now:

- 859 tests pass + 3 skipped (was 748 on this branch; +111 from main's
  new tests for source provenance, agent-instructions renderers,
  managed blocks, etc.)
- ruff clean
- llms-full.txt is byte-identical to `render(REPO_ROOT)`, so
  test_llms_full_is_up_to_date passes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant