chore: add pre-commit config mirroring AGENTS.md §7 validation commands (#137) by dgenio · Pull Request #165 · dgenio/ChainWeaver

dgenio · 2026-05-16T17:15:42Z

Adds .pre-commit-config.yaml at the repo root with hooks that match
the four validation commands verbatim — same paths, same flags — so a
clean pre-commit run --all-files is the strongest local signal that
CI will pass.

Hooks:

ruff check chainweaver/ tests/ examples/
ruff format --check chainweaver/ tests/ examples/
python -m mypy chainweaver/ tests/
gitleaks (secret scanning; richer than detect-private-key alone)
actionlint (syntax check for .github/workflows/*.yml)
stdlib pre-commit-hooks (trailing-whitespace, end-of-file-fixer,
check-yaml, check-toml, check-merge-conflict, check-added-large-files,
detect-private-key)

All hook versions are pinned. CI workflows are unchanged.

CONTRIBUTING.md gains a "Pre-commit hooks" subsection documenting
install steps and the no-bypass policy. .github/copilot-instructions.md
cross-links to it.

Closes #137

…ds (#137) Adds `.pre-commit-config.yaml` at the repo root with hooks that match the four validation commands verbatim — same paths, same flags — so a clean `pre-commit run --all-files` is the strongest local signal that CI will pass. Hooks: - ruff check chainweaver/ tests/ examples/ - ruff format --check chainweaver/ tests/ examples/ - python -m mypy chainweaver/ tests/ - gitleaks (secret scanning; richer than detect-private-key alone) - actionlint (syntax check for .github/workflows/*.yml) - stdlib pre-commit-hooks (trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-merge-conflict, check-added-large-files, detect-private-key) All hook versions are pinned. CI workflows are unchanged. `CONTRIBUTING.md` gains a "Pre-commit hooks" subsection documenting install steps and the no-bypass policy. `.github/copilot-instructions.md` cross-links to it. Closes #137

Adds a mechanical guard against accidental public-API breakage. `tests/test_public_api_snapshot.py` compares the live `chainweaver.__all__` surface against a checked-in golden file (`tests/fixtures/public_api.json`); CI fails if any of these drift without an accompanying regen: - a symbol added to or removed from `__all__`, - a class's public attribute or method shape, - a public function's signature or return annotation, - a Pydantic model's field set or field types. The introspection helper (`tests/api_surface.py`) uses griffe to extract surface info from source AST rather than runtime objects, so the snapshot is stable across Python versions (no `repr(int | str)` vs `repr(Union[int, str])` skew). `griffe>=2.0` lands in the `[dev]` extra. Run `python tests/scripts/regen_public_api.py` after an intentional API change to regenerate the fixture; the diff travels in the same PR as the surface delta and reviewers can read it directly. Three acceptance-criteria smoke tests pass: - Removing a symbol from `__all__` fails CI. - Changing a public function's signature fails CI. - Adding/removing a Pydantic model field fails CI. `docs/versioning-policy.md` gains a "Public-API snapshot guard" section; `docs/agent-context/review-checklist.md` gains a regen reminder. Closes #140

Adds three property test families covering the headline "compiled, not interpreted; same input + same tools = same output" claim: 1. Idempotence — `execute_flow(F, x)` produces identical `final_output` and identical per-step `outputs` across N successive runs (volatile fields: `trace_id`, timestamps, durations excluded by name). 2. Serialization round-trip — `flow_from_yaml(flow_to_yaml(F)).execute(x)` matches `F.execute(x)`. Same for the JSON path. 3. DAG-equivalence — a linear `Flow` and the trivially-sequential `DAGFlow` (one node per step, `depends_on` chain) produce identical `final_output`. Strategies are derived from `helpers.py` schemas via `hypothesis_jsonschema.from_schema` (no hand-coded shape — strategies stay in sync with the Pydantic models). Flow shape is generated from the existing helper toolbelt (`double`, `add_ten`, `format_result`); arbitrary Pydantic schemas at runtime are intentionally out of scope. Property settings: `max_examples=50`, `deadline=200ms` per the issue. Total wall-clock cost in the default test run: ~5 s. Smoke-tested: injecting `random.random()` into `_double_fn` causes property test (1) to fail and Hypothesis shrinks to a minimal counter-example. `pyproject.toml` adds `hypothesis>=6.150` and `hypothesis-jsonschema>=0.23` to `[dev]` (consolidated into the existing extra rather than a separate `[test]` extra — single install surface). Marker `property` registered. `pythonpath = ["tests", "tests/property"]` exposes `helpers` and `strategies` as bare-name imports. CI adds a `pytest -m property --hypothesis-show-statistics` step on the Ubuntu / 3.10 lane so the Hypothesis seed is preserved in CI logs for reproduction. Property tests also run as part of the default matrix test step on all 12 jobs. `docs/agent-context/workflows.md` documents the property-test conventions. Closes #143

Adds a new CI workflow that fails PRs whose median wall-clock for the naive-vs-compiled benchmark regresses beyond 125% of the gh-pages baseline. Wall-clock is the right signal for the "compiled, not interpreted" claim; CodSpeed-style instrumentation is overkill here. Workflow (`.github/workflows/bench.yml`): - Pinned to `ubuntu-22.04` (glibc-driven variance — see https://codspeed.io/blog/unrelated-benchmark-regression). - No matrix (macOS / Windows wall-clock too noisy for a hard gate). - Invokes `benchmark-action/github-action-benchmark@v1` with `tool: customSmallerIsBetter`, `auto-push: true`, `alert-threshold: 125%`, `fail-on-alert: true`. - Permissions: contents:write (auto-push to gh-pages), pull-requests:write (regression comment on PRs). Bench script (`benchmarks/bench_naive_vs_compiled.py`): - New `--repeats N` flag with median-of-N reporting per case. Default N=1 preserves the pre-#144 behavior. - New `--benchmark-action-output PATH` flag that writes a flat JSON array in the customSmallerIsBetter schema the action consumes directly (no jq transform in the workflow). - Variance check on 5 consecutive runs (steps=50, llm=10ms): naive delta < 1%, compiled delta < 10% — well below the 125% threshold. - CI invocation uses `--steps 50 --llm-ms 10 --tool-ms 0 --repeats 5` (~2.7 s wall-clock; signal well above runner noise floor). `benchmarks/baseline.json` is checked in as a local sanity reference, not as the CI comparison source (gh-pages owns that). Regenerated only when an intentional perf change lands. `benchmarks/README.md` documents: - The CI failure semantics (when/why the guard fails). - OS pinning rationale (no macOS / Windows matrix). - The one-off `gh-pages` initialization step (maintainer-only). - The local refresh-baseline workflow. `AGENTS.md` mentions the new bench gate in the CI section. NOTE: `gh-pages` branch initialization is a one-time maintainer bootstrap step that this PR cannot perform on its own; see `benchmarks/README.md` § "One-off gh-pages initialization" for the exact commands. The first `bench.yml` run on `main` after that will seed the initial benchmark dataset. Closes #144

Copilot

Pull request overview

Adds local validation infrastructure, but the diff also expands CI/test coverage with public-API snapshots, Hypothesis property tests, and a benchmark regression workflow.

Changes:

Adds .pre-commit-config.yaml and contributor docs for local hooks.
Adds public API snapshot generation/testing and property-based executor/serialization tests.
Adds benchmark median reporting, baseline docs/data, and a new benchmark CI workflow.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 23 comments.

Show a summary per file

File	Description
`.pre-commit-config.yaml`	Defines local lint/type/security/workflow hooks.
`.github/copilot-instructions.md`	Links to pre-commit contributor docs.
`.github/workflows/ci.yml`	Adds a dedicated property-test CI step.
`.github/workflows/bench.yml`	Adds benchmark regression workflow.
`AGENTS.md`	Documents benchmark workflow behavior.
`CONTRIBUTING.md`	Documents pre-commit installation and policy.
`pyproject.toml`	Adds test/dev dependencies and pytest config.
`docs/versioning-policy.md`	Documents public API snapshot guard.
`docs/agent-context/workflows.md`	Documents property-test conventions.
`docs/agent-context/review-checklist.md`	Adds public API snapshot review item.
`tests/api_surface.py`	Adds public API introspection helper.
`tests/test_public_api_snapshot.py`	Adds snapshot tests for exported API surface.
`tests/fixtures/public_api.json`	Adds generated public API fixture.
`tests/scripts/__init__.py`	Adds scripts package marker.
`tests/scripts/regen_public_api.py`	Adds fixture regeneration script.
`tests/property/strategies.py`	Adds Hypothesis strategies and flow builders.
`tests/property/test_dag_equivalence.py`	Adds linear/DAG equivalence property test.
`tests/property/test_idempotence.py`	Adds executor idempotence property tests.
`tests/property/test_roundtrip.py`	Adds YAML/JSON round-trip property tests.
`benchmarks/bench_naive_vs_compiled.py`	Adds repeats/median reporting and benchmark-action output.
`benchmarks/baseline.json`	Adds sample benchmark baseline data.
`benchmarks/README.md`	Documents benchmark CI guard and baseline workflow.

+# These hooks mirror AGENTS.md §7 "Validation commands" exactly so the
+# local gate matches CI. Run `pre-commit install` once after cloning,


+          - pydantic>=2.0
+          - tenacity>=8.0
+          - typer>=0.9
+          - types-pyyaml>=6.0


@@ -0,0 +1,69 @@
+name: Bench


+      - name: Property tests (Hypothesis seed reporting)
+        if: ${{ matrix.os == 'ubuntu-latest' && matrix.python-version == '3.10' }}
+        run: python -m pytest tests/ -m property --no-cov --hypothesis-show-statistics -v


+    defaults) and public method signatures.
+    """
+    attributes: dict[str, dict[str, Any]] = {}
+    methods: dict[str, dict[str, Any]] = {}
+    for member_name in sorted(cls.members):
+        if member_name.startswith("_"):



 ---

+## Pre-commit hooks


+      - id: ruff
+        name: ruff check (chainweaver/ tests/ examples/)
+        args:
+          - check


+          tool: customSmallerIsBetter
+          output-file-path: bench-result.json
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          auto-push: true


+  # ---------------------------------------------------------------
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.8.6
+    hooks:
+      - id: ruff
+        name: ruff check (chainweaver/ tests/ examples/)
+        args:
+          - check
+          - chainweaver/
+          - tests/
+          - examples/
+        pass_filenames: false
+      - id: ruff-format
+        name: ruff format --check (chainweaver/ tests/ examples/)
+        args:
+          - --check
+          - chainweaver/
+          - tests/
+          - examples/


+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.13.0
+    hooks:
+      - id: mypy
+        name: python -m mypy chainweaver/ tests/
+        args:
+          - chainweaver/
+          - tests/
+        pass_filenames: false
+        additional_dependencies:
+          - pydantic>=2.0
+          - tenacity>=8.0
+          - typer>=0.9
+          - types-pyyaml>=6.0


claude added 4 commits May 16, 2026 16:46

Copilot AI review requested due to automatic review settings May 16, 2026 17:15

Copilot started reviewing on behalf of dgenio May 16, 2026 17:15 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add pre-commit config mirroring AGENTS.md §7 validation commands (#137)#165

chore: add pre-commit config mirroring AGENTS.md §7 validation commands (#137)#165
dgenio wants to merge 4 commits into
mainfrom
claude/triage-issues-7JgBE

dgenio commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# These hooks mirror AGENTS.md §7 "Validation commands" exactly so the
		# local gate matches CI. Run `pre-commit install` once after cloning,

Conversation

dgenio commented May 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants