chore: add pre-commit config mirroring AGENTS.md §7 validation commands (#137)#165
Open
dgenio wants to merge 4 commits into
Open
chore: add pre-commit config mirroring AGENTS.md §7 validation commands (#137)#165dgenio wants to merge 4 commits into
dgenio wants to merge 4 commits into
Conversation
…ds (#137) Adds `.pre-commit-config.yaml` at the repo root with hooks that match the four validation commands verbatim — same paths, same flags — so a clean `pre-commit run --all-files` is the strongest local signal that CI will pass. Hooks: - ruff check chainweaver/ tests/ examples/ - ruff format --check chainweaver/ tests/ examples/ - python -m mypy chainweaver/ tests/ - gitleaks (secret scanning; richer than detect-private-key alone) - actionlint (syntax check for .github/workflows/*.yml) - stdlib pre-commit-hooks (trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-merge-conflict, check-added-large-files, detect-private-key) All hook versions are pinned. CI workflows are unchanged. `CONTRIBUTING.md` gains a "Pre-commit hooks" subsection documenting install steps and the no-bypass policy. `.github/copilot-instructions.md` cross-links to it. Closes #137
Adds a mechanical guard against accidental public-API breakage. `tests/test_public_api_snapshot.py` compares the live `chainweaver.__all__` surface against a checked-in golden file (`tests/fixtures/public_api.json`); CI fails if any of these drift without an accompanying regen: - a symbol added to or removed from `__all__`, - a class's public attribute or method shape, - a public function's signature or return annotation, - a Pydantic model's field set or field types. The introspection helper (`tests/api_surface.py`) uses griffe to extract surface info from source AST rather than runtime objects, so the snapshot is stable across Python versions (no `repr(int | str)` vs `repr(Union[int, str])` skew). `griffe>=2.0` lands in the `[dev]` extra. Run `python tests/scripts/regen_public_api.py` after an intentional API change to regenerate the fixture; the diff travels in the same PR as the surface delta and reviewers can read it directly. Three acceptance-criteria smoke tests pass: - Removing a symbol from `__all__` fails CI. - Changing a public function's signature fails CI. - Adding/removing a Pydantic model field fails CI. `docs/versioning-policy.md` gains a "Public-API snapshot guard" section; `docs/agent-context/review-checklist.md` gains a regen reminder. Closes #140
Adds three property test families covering the headline "compiled, not interpreted; same input + same tools = same output" claim: 1. Idempotence — `execute_flow(F, x)` produces identical `final_output` and identical per-step `outputs` across N successive runs (volatile fields: `trace_id`, timestamps, durations excluded by name). 2. Serialization round-trip — `flow_from_yaml(flow_to_yaml(F)).execute(x)` matches `F.execute(x)`. Same for the JSON path. 3. DAG-equivalence — a linear `Flow` and the trivially-sequential `DAGFlow` (one node per step, `depends_on` chain) produce identical `final_output`. Strategies are derived from `helpers.py` schemas via `hypothesis_jsonschema.from_schema` (no hand-coded shape — strategies stay in sync with the Pydantic models). Flow shape is generated from the existing helper toolbelt (`double`, `add_ten`, `format_result`); arbitrary Pydantic schemas at runtime are intentionally out of scope. Property settings: `max_examples=50`, `deadline=200ms` per the issue. Total wall-clock cost in the default test run: ~5 s. Smoke-tested: injecting `random.random()` into `_double_fn` causes property test (1) to fail and Hypothesis shrinks to a minimal counter-example. `pyproject.toml` adds `hypothesis>=6.150` and `hypothesis-jsonschema>=0.23` to `[dev]` (consolidated into the existing extra rather than a separate `[test]` extra — single install surface). Marker `property` registered. `pythonpath = ["tests", "tests/property"]` exposes `helpers` and `strategies` as bare-name imports. CI adds a `pytest -m property --hypothesis-show-statistics` step on the Ubuntu / 3.10 lane so the Hypothesis seed is preserved in CI logs for reproduction. Property tests also run as part of the default matrix test step on all 12 jobs. `docs/agent-context/workflows.md` documents the property-test conventions. Closes #143
Adds a new CI workflow that fails PRs whose median wall-clock for the naive-vs-compiled benchmark regresses beyond 125% of the gh-pages baseline. Wall-clock is the right signal for the "compiled, not interpreted" claim; CodSpeed-style instrumentation is overkill here. Workflow (`.github/workflows/bench.yml`): - Pinned to `ubuntu-22.04` (glibc-driven variance — see https://codspeed.io/blog/unrelated-benchmark-regression). - No matrix (macOS / Windows wall-clock too noisy for a hard gate). - Invokes `benchmark-action/github-action-benchmark@v1` with `tool: customSmallerIsBetter`, `auto-push: true`, `alert-threshold: 125%`, `fail-on-alert: true`. - Permissions: contents:write (auto-push to gh-pages), pull-requests:write (regression comment on PRs). Bench script (`benchmarks/bench_naive_vs_compiled.py`): - New `--repeats N` flag with median-of-N reporting per case. Default N=1 preserves the pre-#144 behavior. - New `--benchmark-action-output PATH` flag that writes a flat JSON array in the customSmallerIsBetter schema the action consumes directly (no jq transform in the workflow). - Variance check on 5 consecutive runs (steps=50, llm=10ms): naive delta < 1%, compiled delta < 10% — well below the 125% threshold. - CI invocation uses `--steps 50 --llm-ms 10 --tool-ms 0 --repeats 5` (~2.7 s wall-clock; signal well above runner noise floor). `benchmarks/baseline.json` is checked in as a local sanity reference, not as the CI comparison source (gh-pages owns that). Regenerated only when an intentional perf change lands. `benchmarks/README.md` documents: - The CI failure semantics (when/why the guard fails). - OS pinning rationale (no macOS / Windows matrix). - The one-off `gh-pages` initialization step (maintainer-only). - The local refresh-baseline workflow. `AGENTS.md` mentions the new bench gate in the CI section. NOTE: `gh-pages` branch initialization is a one-time maintainer bootstrap step that this PR cannot perform on its own; see `benchmarks/README.md` § "One-off gh-pages initialization" for the exact commands. The first `bench.yml` run on `main` after that will seed the initial benchmark dataset. Closes #144
There was a problem hiding this comment.
Pull request overview
Adds local validation infrastructure, but the diff also expands CI/test coverage with public-API snapshots, Hypothesis property tests, and a benchmark regression workflow.
Changes:
- Adds
.pre-commit-config.yamland contributor docs for local hooks. - Adds public API snapshot generation/testing and property-based executor/serialization tests.
- Adds benchmark median reporting, baseline docs/data, and a new benchmark CI workflow.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 23 comments.
Show a summary per file
| File | Description |
|---|---|
.pre-commit-config.yaml |
Defines local lint/type/security/workflow hooks. |
.github/copilot-instructions.md |
Links to pre-commit contributor docs. |
.github/workflows/ci.yml |
Adds a dedicated property-test CI step. |
.github/workflows/bench.yml |
Adds benchmark regression workflow. |
AGENTS.md |
Documents benchmark workflow behavior. |
CONTRIBUTING.md |
Documents pre-commit installation and policy. |
pyproject.toml |
Adds test/dev dependencies and pytest config. |
docs/versioning-policy.md |
Documents public API snapshot guard. |
docs/agent-context/workflows.md |
Documents property-test conventions. |
docs/agent-context/review-checklist.md |
Adds public API snapshot review item. |
tests/api_surface.py |
Adds public API introspection helper. |
tests/test_public_api_snapshot.py |
Adds snapshot tests for exported API surface. |
tests/fixtures/public_api.json |
Adds generated public API fixture. |
tests/scripts/__init__.py |
Adds scripts package marker. |
tests/scripts/regen_public_api.py |
Adds fixture regeneration script. |
tests/property/strategies.py |
Adds Hypothesis strategies and flow builders. |
tests/property/test_dag_equivalence.py |
Adds linear/DAG equivalence property test. |
tests/property/test_idempotence.py |
Adds executor idempotence property tests. |
tests/property/test_roundtrip.py |
Adds YAML/JSON round-trip property tests. |
benchmarks/bench_naive_vs_compiled.py |
Adds repeats/median reporting and benchmark-action output. |
benchmarks/baseline.json |
Adds sample benchmark baseline data. |
benchmarks/README.md |
Documents benchmark CI guard and baseline workflow. |
Comment on lines
+3
to
+4
| # These hooks mirror AGENTS.md §7 "Validation commands" exactly so the | ||
| # local gate matches CI. Run `pre-commit install` once after cloning, |
| - pydantic>=2.0 | ||
| - tenacity>=8.0 | ||
| - typer>=0.9 | ||
| - types-pyyaml>=6.0 |
| @@ -0,0 +1,69 @@ | |||
| name: Bench | |||
Comment on lines
+53
to
+55
| - name: Property tests (Hypothesis seed reporting) | ||
| if: ${{ matrix.os == 'ubuntu-latest' && matrix.python-version == '3.10' }} | ||
| run: python -m pytest tests/ -m property --no-cov --hypothesis-show-statistics -v |
Comment on lines
+63
to
+68
| defaults) and public method signatures. | ||
| """ | ||
| attributes: dict[str, dict[str, Any]] = {} | ||
| methods: dict[str, dict[str, Any]] = {} | ||
| for member_name in sorted(cls.members): | ||
| if member_name.startswith("_"): |
|
|
||
| --- | ||
|
|
||
| ## Pre-commit hooks |
| - id: ruff | ||
| name: ruff check (chainweaver/ tests/ examples/) | ||
| args: | ||
| - check |
| tool: customSmallerIsBetter | ||
| output-file-path: bench-result.json | ||
| github-token: ${{ secrets.GITHUB_TOKEN }} | ||
| auto-push: true |
Comment on lines
+29
to
+47
| # --------------------------------------------------------------- | ||
| - repo: https://github.com/astral-sh/ruff-pre-commit | ||
| rev: v0.8.6 | ||
| hooks: | ||
| - id: ruff | ||
| name: ruff check (chainweaver/ tests/ examples/) | ||
| args: | ||
| - check | ||
| - chainweaver/ | ||
| - tests/ | ||
| - examples/ | ||
| pass_filenames: false | ||
| - id: ruff-format | ||
| name: ruff format --check (chainweaver/ tests/ examples/) | ||
| args: | ||
| - --check | ||
| - chainweaver/ | ||
| - tests/ | ||
| - examples/ |
Comment on lines
+53
to
+66
| - repo: https://github.com/pre-commit/mirrors-mypy | ||
| rev: v1.13.0 | ||
| hooks: | ||
| - id: mypy | ||
| name: python -m mypy chainweaver/ tests/ | ||
| args: | ||
| - chainweaver/ | ||
| - tests/ | ||
| pass_filenames: false | ||
| additional_dependencies: | ||
| - pydantic>=2.0 | ||
| - tenacity>=8.0 | ||
| - typer>=0.9 | ||
| - types-pyyaml>=6.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
.pre-commit-config.yamlat the repo root with hooks that matchthe four validation commands verbatim — same paths, same flags — so a
clean
pre-commit run --all-filesis the strongest local signal thatCI will pass.
Hooks:
check-yaml, check-toml, check-merge-conflict, check-added-large-files,
detect-private-key)
All hook versions are pinned. CI workflows are unchanged.
CONTRIBUTING.mdgains a "Pre-commit hooks" subsection documentinginstall steps and the no-bypass policy.
.github/copilot-instructions.mdcross-links to it.
Closes #137