feat: add `chainweaver diff <a> <b>` CLI for step-by-step trace comparison (#148) by dgenio · Pull Request #160 · dgenio/ChainWeaver

dgenio · 2026-05-16T06:37:06Z

Summary

Adds chainweaver diff <a.json> <b.json> so operators can compare two ExecutionResult JSON files step-by-step. Sister tool to profile (#159).

Stacked on top of #159 (profile CLI); base cascades through #159 → #158 → #157 → main as those merge.

Closes #148.

Changes

pyproject.toml — adds deepdiff>=8.0 to [project.dependencies]. Justification below.
chainweaver/cli.py — new diff_command, _compare_traces (structural comparison), _step_outputs_diff (DeepDiff-backed), _format_diff_table (human renderer).
tests/test_cli_diff.py — new file, 13 test cases.

Behavior

Aligns step records by position. Walks outputs, error_type, error_message, success for each pair. Optionally flags per-step duration regressions beyond --perf-tolerance N%. Non-deterministic fields (trace_id, started_at, ended_at, total_duration_ms, per-step duration_ms when no tolerance is set) are ignored by default.

Flag	Meaning
`--perf-tolerance N`	Flag steps whose `duration_ms` changed by more than N %. Off by default.
`--format table\|json` (`-f`)	Default `table` shows structural deltas; `json` emits the structured diff payload.

Exit codes

0 — identical (modulo ignored fields).
1 — differs, or malformed trace input.
2 — file not found.

Testing

Linting passes (ruff check chainweaver/ tests/ examples/)
Formatting check passes (ruff format --check chainweaver/ tests/ examples/)
Type checking passes (python -m mypy chainweaver/ tests/)
All existing tests pass — 504/504 passed in 1.99s (491 pre-existing + 13 new)
New tests added for new functionality

$ ruff check chainweaver/ tests/ examples/
All checks passed!
$ ruff format --check chainweaver/ tests/ examples/
51 files already formatted
$ python -m mypy chainweaver/ tests/
Success: no issues found in 44 source files
$ python -m pytest tests/ -q --no-cov
504 passed in 1.99s

Diff stat: 3 files changed, 580 insertions(+).

Related Issues

Closes #148. Sister to chainweaver profile (#159 / #147).

Checklist

Code follows project conventions (see AGENTS.md and docs/agent-context/)
Public API changes are documented — CLI docstring updated; one new runtime dep documented below
No secrets or credentials included

Tradeoffs / risks

New runtime dependency: deepdiff>=8.0. Justification: hand-rolling recursive nested-dict diff would add ~150 LoC of fragile code, and the issue body specifically calls for "JSON-aware diff". DeepDiff is small (~150 KB), well-maintained, has stable cross-platform wheels for Python 3.10–3.13, and supports the tree view with to_dict() for JSON-safe output. This brings the runtime-deps total from 4 → 5 (still well within the "lean dep set" spirit; all five are well-known and CLI-essential). Cleared per the relaxed-constraints answer ("I don't mind adding new dependencies").
DeepDiff API surface is large; we only use DeepDiff(a, b, ignore_order=True, view="tree").to_dict(). Wrapping it in _step_outputs_diff keeps the surface area minimal so a future swap is local.
Performance-tolerance off by default: matches the issue's "non-deterministic fields ignored by default" framing. Users opt in to perf checks explicitly.
Step alignment is positional: tool-name renames at the same index get flagged via tool_name_change rather than being treated as separate insert/delete events. This is the simplest reasonable contract for now; reordered-steps detection is out of scope.

Scope notes

Closes #148 only. Adjacent items:

DeepDiff for profile — the profile verb could use it too for richer output rendering. Out of scope here; profile lands first and its current statistics-based approach is fine.
Replay-from-diff helper — once chainweaver diff lands, a follow-up could let chainweaver replay --diff re-execute only the diverging steps. Tracked separately if needed.

https://claude.ai/code/session_01QcSJ3NWhe5B4k1EP25Hx3n

Generated by Claude Code

Closes #148. Compares two ExecutionResult JSON files step-by-step. Aligns step records by position; walks outputs / error_type / error_message / success; optionally flags per-step duration regressions beyond a configurable threshold. Non-deterministic fields (trace_id, timestamps, total/per-step durations) are ignored by default. Usage: chainweaver diff yesterday.json today.json chainweaver diff base.json candidate.json --perf-tolerance 25 chainweaver diff a.json b.json --format json Exit codes: - 0 — identical (modulo ignored fields). - 1 — differs, or malformed input. - 2 — file not found. Implementation: - chainweaver/cli.py — new `diff_command`, `_compare_traces` (structural comparison), `_step_outputs_diff` (DeepDiff-backed), and `_format_diff_table` (human-readable renderer). - DeepDiff is a new required runtime dependency (`deepdiff>=8.0`). Hand-rolling recursive dict diff would add ~150 LoC of fragile code; DeepDiff is small, well-maintained, and matches the issue's "JSON-aware diff" requirement out of the box. Tests: 13 new cases in tests/test_cli_diff.py covering: - Identity: identical traces with different trace_ids return exit 0, JSON output shape stable. - Divergence: different flow_names, diverging step outputs (table + JSON), error vs success transitions, mismatched step counts. - Performance tolerance: within / exceeds / off-by-default semantics. - File errors: missing first file (exit 2), missing second file (exit 2), malformed trace (exit 1). Verification: $ ruff check chainweaver/ tests/ examples/ # All checks passed $ ruff format --check chainweaver/ tests/ ... # 51 files already formatted $ python -m mypy chainweaver/ tests/ # Success: no issues $ python -m pytest tests/ -q --no-cov # 504 passed in 1.99s Stacked on top of #159 (profile CLI); base cascades through #159 → #158 → #157 → main as those merge. https://claude.ai/code/session_01QcSJ3NWhe5B4k1EP25Hx3n

dgenio mentioned this pull request May 16, 2026

feat: add ChainAnalyzer for offline schema-compatibility analysis (#77) #161

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `chainweaver diff <a> <b>` CLI for step-by-step trace comparison (#148)#160

feat: add `chainweaver diff <a> <b>` CLI for step-by-step trace comparison (#148)#160
dgenio wants to merge 1 commit into
feat/147-cli-profile-subcommandfrom
feat/148-cli-diff-subcommand

dgenio commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgenio commented May 16, 2026

Summary

Changes

Behavior

Exit codes

Testing

Related Issues

Checklist

Tradeoffs / risks

Scope notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants