feat: add chainweaver diff <a> <b> CLI for step-by-step trace comparison (#148)#160
Open
dgenio wants to merge 1 commit into
Open
feat: add chainweaver diff <a> <b> CLI for step-by-step trace comparison (#148)#160dgenio wants to merge 1 commit into
chainweaver diff <a> <b> CLI for step-by-step trace comparison (#148)#160dgenio wants to merge 1 commit into
Conversation
Closes #148. Compares two ExecutionResult JSON files step-by-step. Aligns step records by position; walks outputs / error_type / error_message / success; optionally flags per-step duration regressions beyond a configurable threshold. Non-deterministic fields (trace_id, timestamps, total/per-step durations) are ignored by default. Usage: chainweaver diff yesterday.json today.json chainweaver diff base.json candidate.json --perf-tolerance 25 chainweaver diff a.json b.json --format json Exit codes: - 0 — identical (modulo ignored fields). - 1 — differs, or malformed input. - 2 — file not found. Implementation: - chainweaver/cli.py — new `diff_command`, `_compare_traces` (structural comparison), `_step_outputs_diff` (DeepDiff-backed), and `_format_diff_table` (human-readable renderer). - DeepDiff is a new required runtime dependency (`deepdiff>=8.0`). Hand-rolling recursive dict diff would add ~150 LoC of fragile code; DeepDiff is small, well-maintained, and matches the issue's "JSON-aware diff" requirement out of the box. Tests: 13 new cases in tests/test_cli_diff.py covering: - Identity: identical traces with different trace_ids return exit 0, JSON output shape stable. - Divergence: different flow_names, diverging step outputs (table + JSON), error vs success transitions, mismatched step counts. - Performance tolerance: within / exceeds / off-by-default semantics. - File errors: missing first file (exit 2), missing second file (exit 2), malformed trace (exit 1). Verification: $ ruff check chainweaver/ tests/ examples/ # All checks passed $ ruff format --check chainweaver/ tests/ ... # 51 files already formatted $ python -m mypy chainweaver/ tests/ # Success: no issues $ python -m pytest tests/ -q --no-cov # 504 passed in 1.99s Stacked on top of #159 (profile CLI); base cascades through #159 → #158 → #157 → main as those merge. https://claude.ai/code/session_01QcSJ3NWhe5B4k1EP25Hx3n
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
chainweaver diff <a.json> <b.json>so operators can compare twoExecutionResultJSON files step-by-step. Sister tool toprofile(#159).Stacked on top of #159 (profile CLI); base cascades through
#159 → #158 → #157 → mainas those merge.Closes #148.
Changes
pyproject.toml— addsdeepdiff>=8.0to[project.dependencies]. Justification below.chainweaver/cli.py— newdiff_command,_compare_traces(structural comparison),_step_outputs_diff(DeepDiff-backed),_format_diff_table(human renderer).tests/test_cli_diff.py— new file, 13 test cases.Behavior
Aligns step records by position. Walks
outputs,error_type,error_message,successfor each pair. Optionally flags per-step duration regressions beyond--perf-tolerance N%. Non-deterministic fields (trace_id,started_at,ended_at,total_duration_ms, per-stepduration_mswhen no tolerance is set) are ignored by default.--perf-tolerance Nduration_mschanged by more than N %. Off by default.--format table|json(-f)tableshows structural deltas;jsonemits the structured diff payload.Exit codes
0— identical (modulo ignored fields).1— differs, or malformed trace input.2— file not found.Testing
ruff check chainweaver/ tests/ examples/)ruff format --check chainweaver/ tests/ examples/)python -m mypy chainweaver/ tests/)Diff stat:
3 files changed, 580 insertions(+).Related Issues
Closes #148. Sister to
chainweaver profile(#159 / #147).Checklist
AGENTS.mdanddocs/agent-context/)Tradeoffs / risks
deepdiff>=8.0. Justification: hand-rolling recursive nested-dict diff would add ~150 LoC of fragile code, and the issue body specifically calls for "JSON-aware diff". DeepDiff is small (~150 KB), well-maintained, has stable cross-platform wheels for Python 3.10–3.13, and supports thetreeview withto_dict()for JSON-safe output. This brings the runtime-deps total from 4 → 5 (still well within the "lean dep set" spirit; all five are well-known and CLI-essential). Cleared per the relaxed-constraints answer ("I don't mind adding new dependencies").DeepDiff(a, b, ignore_order=True, view="tree").to_dict(). Wrapping it in_step_outputs_diffkeeps the surface area minimal so a future swap is local.tool_name_changerather than being treated as separate insert/delete events. This is the simplest reasonable contract for now; reordered-steps detection is out of scope.Scope notes
Closes #148 only. Adjacent items:
profile— theprofileverb could use it too for richer output rendering. Out of scope here;profilelands first and its currentstatistics-based approach is fine.chainweaver difflands, a follow-up could letchainweaver replay --diffre-execute only the diverging steps. Tracked separately if needed.https://claude.ai/code/session_01QcSJ3NWhe5B4k1EP25Hx3n
Generated by Claude Code