Skip to content

feat: add chainweaver attest <flow> for observed-determinism evidence (#154)#162

Open
dgenio wants to merge 1 commit into
feat/77-analyzer-foundationfrom
feat/154-cli-attest-subcommand
Open

feat: add chainweaver attest <flow> for observed-determinism evidence (#154)#162
dgenio wants to merge 1 commit into
feat/77-analyzer-foundationfrom
feat/154-cli-attest-subcommand

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 16, 2026

Summary

Adds chainweaver/attest.py plus the chainweaver attest <flow> CLI verb. Turns ChainWeaver's "compiled flows are deterministic" claim into a reproducible, machine-verifiable artifact.

Stacked on top of #161 (analyzer foundation); base cascades through #161 → #160 → #159 → #158 → #157 → main as those merge.

Closes #154.

Changes

  • chainweaver/attest.py — new module (~330 LoC). attest_flow() API + AttestationReport (Pydantic) + AttestationInputError + a seeded stdlib input generator.
  • chainweaver/cli.py — new attest_command (~165 LoC including table renderer + flag definitions).
  • chainweaver/__init__.py — exports AttestationInputError, AttestationReport, attest_flow via __all__.
  • tests/test_attest.py — new file, 16 test cases (API + CLI + generator coverage).

Pipeline

  1. Generate N reproducible inputs (seed-driven random.Random) or accept a user-supplied list via --seed-input.
  2. For each input, run the flow M times.
  3. Hash the canonical JSON of every final_output; assert all M agree.
  4. Emit an AttestationReport: ChainWeaver version, flow name+version, flow_schema_fingerprint, tool_schema_hashes, N, repeats, seed, host info (no PII), duration, observed_deterministic, aggregate_fingerprint, divergences.

Framing

This is observed-deterministic evidence, not a formal proof. Re-running with the same flow, tools, and seed yields a byte-identical aggregate_fingerprint.

CLI surface

Flag Meaning
--tools <module> (-t, repeatable) tool import paths
--runs N (default 100) number of distinct inputs
--repeats M (default 3, must be >= 2) runs per input
--seed S (default 0) generator seed (same → same inputs)
--seed-input <file> bypass generator with a JSON array of objects
--format json|table (-f) default json — the attestation artifact

Exit codes

  • 0 — observed-deterministic across all inputs.
  • 1 — divergence, execution failure, or CLI-level error.
  • 2 — flow file or tools module not found / not importable.

Scope-delta call (Mode B)

The issue body says "use Hypothesis strategies as in #143." I shipped a stdlib random.Random-seeded generator instead. Reasoning:

  • Hypothesis is built around @given for property exploration and shrinking, not "give me N reproducible inputs by seed."
  • Getting deterministic enumeration from Hypothesis requires reaching into its internals (ConjectureData, etc.).
  • The stdlib generator is ~60 LoC, fully deterministic by construction, and covers the common subset of Pydantic types (int, float, bool, str, list[X], dict[K, V], Literal, Optional[X], nested BaseModel).
  • The seam is explicit: _generate_inputs() can be swapped for a Hypothesis-backed implementation once Add Hypothesis property-based determinism test harness #143 lands, without touching the rest of the loop.

Net: no hypothesis dependency added. If you'd rather take the hypothesis route, say the word and I'll swap _generate_inputs() behind an optional extra.

Testing

  • Linting passes (ruff check chainweaver/ tests/ examples/)
  • Formatting check passes (ruff format --check chainweaver/ tests/ examples/)
  • Type checking passes (python -m mypy chainweaver/ tests/)
  • All existing tests pass — 543/543 passed in 2.12s (527 pre-existing + 16 new)
  • New tests added for new functionality
$ ruff check chainweaver/ tests/ examples/
All checks passed!
$ ruff format --check chainweaver/ tests/ examples/
56 files already formatted
$ python -m mypy chainweaver/ tests/
Success: no issues found in 48 source files
$ python -m pytest tests/ -q --no-cov
543 passed in 2.12s

Tests cover both halves: the programmatic attest_flow() API (deterministic happy path, seed reproducibility, different seeds → different fingerprints, flaky-tool failure, seed_inputs bypass, validation errors, structural-fingerprint sensitivity, multi-type generator coverage) and the CLI surface (happy-path JSON, table format, --seed-input bypass, missing flow file, repeats validation, malformed/non-array seed-input).

Diff stat: 4 files changed, 1136 insertions(+).

Related Issues

Closes #154. Companion to:

Checklist

  • Code follows project conventions (see AGENTS.md and docs/agent-context/)
  • Public API changes are documented — AttestationInputError, AttestationReport, attest_flow in __all__; CLI docstring lists attest
  • No secrets or credentials included

Tradeoffs / risks

  • Stdlib random.Random vs Hypothesis. Documented above; the user accepted the relaxation but property-based testing wasn't the right tool for "reproducible N samples by seed." Switching later is a localized change to _generate_inputs().
  • flow_schema_fingerprint excludes status / description. Intentional: those fields don't affect runtime behavior, and including them would cause aggregate_fingerprint to change on cosmetic edits.
  • Host info is non-PII by construction. Only OS family, Python version, and architecture are recorded. Hostnames and usernames are deliberately excluded.
  • executor._tools access. attest_flow() reads the executor's private tool registry to compute tool_schema_hashes. Adding a public iterator on FlowExecutor would be cleaner but is out of scope here (touches an unrelated class). Same-package access is consistent with how cli.py already reaches into module internals.
  • No cryptographic signing. The artifact is reproducible but not signed. A Sigstore bundle follow-up was flagged in the original issue body — kept out of scope.

Scope notes

Closes #154 only. Adjacent items deferred:

https://claude.ai/code/session_01QcSJ3NWhe5B4k1EP25Hx3n


Generated by Claude Code

Closes #154.

Introduces chainweaver/attest.py — a deterministic-by-evidence
attestation loop — and the `chainweaver attest` CLI verb that drives it.

Pipeline:
1. Generate N reproducible inputs (seed-driven, stdlib random.Random) or
   accept a user-supplied list via --seed-input.
2. For each input, run the flow M times.
3. Hash the canonical JSON of every final_output; assert all M agree.
4. Emit AttestationReport: chainweaver version, flow name + version,
   flow_schema_fingerprint, tool_schema_hashes, N, repeats, seed,
   host_info (no PII), wall-clock duration, observed_deterministic,
   aggregate_fingerprint, and a divergences list.

Framing: this is *observed*-deterministic evidence, not a formal proof.
Re-running with the same flow, tools, and seed yields a byte-identical
aggregate_fingerprint.

Scope-delta from the issue body:
  The issue references Hypothesis strategies "as in #143". Hypothesis is
  built around the @given decorator for property exploration and shrinking,
  not "give me N reproducible inputs by seed" — getting deterministic
  enumeration requires reaching into its internals. I shipped a small
  stdlib random.Random-seeded generator instead (~60 LoC); it covers
  int / float / bool / str / list[X] / dict / Literal / Optional /
  nested BaseModel. The seam is explicit so #143's Hypothesis-based
  generator can replace _generate_inputs() later without touching the
  rest of the loop.

CLI flags:
- --tools <module> (repeatable, -t): tool import paths
- --runs N (default 100): number of distinct inputs
- --repeats M (default 3, >= 2): runs per input
- --seed S (default 0): generator seed
- --seed-input <file>: bypass the generator with a JSON array of objects
- --format json|table (-f): default json — the attestation artifact

Exit codes:
- 0 — observed-deterministic across all inputs
- 1 — divergence, execution failure, or CLI-level error
- 2 — flow file or tools module not found / not importable

Public API additions (exported in __init__.py __all__):
- AttestationInputError
- AttestationReport
- attest_flow

Tests: 16 cases in tests/test_attest.py covering:
- Programmatic attest_flow(): deterministic flow passes, seed
  reproducibility, different seeds → different fingerprints, flaky
  tool fails, seed_inputs bypass, repeats < 2 raises, missing
  input_schema raises, structurally-different flow → different
  flow_schema_fingerprint, multi-type generator coverage.
- CLI: happy-path JSON, table format, --seed-input bypass, missing
  flow file (exit 2), repeats < 2 (exit 1), malformed --seed-input
  (exit 1), non-array --seed-input (exit 1).

Verification:
  $ ruff check chainweaver/ tests/ examples/      # All checks passed
  $ ruff format --check chainweaver/ tests/ ...   # 56 files already formatted
  $ python -m mypy chainweaver/ tests/            # Success: no issues
  $ python -m pytest tests/ -q --no-cov           # 543 passed in 2.12s

Stacked on top of #161 (analyzer foundation); chains through
#161#160#159#158#157 → main as those merge.

https://claude.ai/code/session_01QcSJ3NWhe5B4k1EP25Hx3n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants