Skip to content

feat: add chainweaver profile <traces...> CLI for bottleneck analysis (#147)#159

Open
dgenio wants to merge 1 commit into
feat/129-cli-run-subcommandfrom
feat/147-cli-profile-subcommand
Open

feat: add chainweaver profile <traces...> CLI for bottleneck analysis (#147)#159
dgenio wants to merge 1 commit into
feat/129-cli-run-subcommandfrom
feat/147-cli-profile-subcommand

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 16, 2026

Summary

Adds chainweaver profile <traces...> so operators can answer "which step is slow? is it always slow?" from ExecutionResult JSON files — without writing custom Python. Self-hosted, dep-free.

Stacked on top of #158 (run CLI); base auto-retargets through #158 → #157 → main as those merge.

Closes #147.

Changes

  • chainweaver/viz.py — private _render_step_bar_chart(rows, ...) helper using unicode block characters; scales bars to the longest row and right-truncates tool names that exceed name_width.
  • chainweaver/cli.py — new profile_command, three private helpers (_load_execution_result, _percentiles, _quantile), single-trace and multi-trace renderers, module-docstring update.
  • tests/test_cli_profile.py — new test file (12 cases).

Behavior

Single trace:

  • Total wall-clock + sum-of-step ms + orchestration overhead.
  • Per-step ASCII bar chart sorted by duration_ms descending.
  • --top N (default 10) truncates and surfaces "M more step(s) not shown".

Multi trace:

  • Verifies all traces share flow_name and step count (exits 1 with a clear message otherwise).
  • Per-step p50 / p95 / p99 / mean / stdev via stdlib statistics.
  • "Consistency" warning when a step's stdev > 50 % of its mean.

--format json emits a stable, machine-readable shape (single-trace + multi-trace branches differ by trace_count).

Exit codes

  • 0 — analysis ran. Note: a failed flow still exits 0 because profile is read-only; the failure is signalled in the per-step rows (ERR).
  • 1 — malformed trace, mixed flow names, mismatched step counts across traces, or invalid --top.
  • 2 — file not found.

Testing

  • Linting passes (ruff check chainweaver/ tests/ examples/)
  • Formatting check passes (ruff format --check chainweaver/ tests/ examples/)
  • Type checking passes (python -m mypy chainweaver/ tests/)
  • All existing tests pass — 491/491 passed in 1.86s (479 pre-existing + 12 new)
  • New tests added for new functionality
$ ruff check chainweaver/ tests/ examples/
All checks passed!
$ ruff format --check chainweaver/ tests/ examples/
50 files already formatted
$ python -m mypy chainweaver/ tests/
Success: no issues found in 43 source files
$ python -m pytest tests/ -q --no-cov
491 passed in 1.86s

Diff stat: 3 files changed, 599 insertions(+), 2 deletions(-).

Related Issues

Closes #147. Sister tool to the upcoming chainweaver diff (#148).

Checklist

  • Code follows project conventions (see AGENTS.md and docs/agent-context/)
  • Public API changes are documented — CLI docstring updated; _render_step_bar_chart is intentionally private (single in-package consumer)
  • No secrets or credentials included

Tradeoffs / risks

  • Single test file per verb (tests/test_cli_profile.py) rather than appending to tests/test_cli.py. Justification: the existing test_cli.py is now 778 lines after run; splitting per verb keeps each file tractable. Owner-mode scope-delta call.
  • Inline _quantile instead of statistics.quantiles(..., n=100): cheaper for single p95/p99 reads (avoids allocating the 99-element decile list). Linear interpolation matches the inclusive method semantics.
  • profile on a failed flow exits 0: the analysis itself succeeded; the failure shows up as ERR in the rows. This matches the read-only contract of inspect / viz and avoids conflating "couldn't analyze" with "the analyzed flow failed."

Scope notes

Closes #147 only. Adjacent items deferred: chainweaver diff (#148, next PR in this stack), example-trace fixture under examples/ (tests/fixtures/-style fixtures suffice for the in-test coverage), and the planned MkDocs site (docs/cli.md — depends on #133).

https://claude.ai/code/session_01QcSJ3NWhe5B4k1EP25Hx3n


Generated by Claude Code

Closes #147.

Single-trace mode answers "which step is slow?" from one ExecutionResult
JSON file. Multi-trace mode answers "is it always slow?" by aggregating
p50 / p95 / p99 / mean / stdev per step across N traces.

Usage:

    chainweaver profile path/to/trace.json
    chainweaver profile path/to/trace.json --top 5
    chainweaver profile path/to/*.trace.json --format json

Single trace (table):
- Total wall-clock + sum-of-step ms + orchestration overhead.
- Per-step ASCII bar chart sorted by duration_ms descending.
- --top N (default 10) truncates and surfaces "M more step(s) not shown".

Multi trace (table):
- Verifies all traces share the same flow_name and step count (exits 1
  with a clear message otherwise).
- Per-step p50 / p95 / p99 / mean / stdev via stdlib statistics.
- Consistency warning when a step's stdev > 50% of its mean.

JSON format (both modes) is a stable machine-readable shape suitable for
CI gates.

Exit codes:
- 0 — analysis ran (a failed flow still exits 0 because profile is
  read-only; failure is signalled in the per-step rows).
- 1 — malformed trace, mixed flow names, mismatched step counts, or
  invalid --top.
- 2 — file not found.

Implementation:
- chainweaver/viz.py — private `_render_step_bar_chart()` helper using
  unicode block characters; scales bars to the longest row, truncates
  tool names that exceed `name_width`.
- chainweaver/cli.py — `profile_command` + two private helpers
  (`_load_execution_result`, `_percentiles`) + a small linear-interp
  quantile (`_quantile`) so single-call p95/p99 don't require
  allocating the full decile list.
- No new runtime dependency — pure stdlib statistics + viz string ops.

Tests: 12 new cases in tests/test_cli_profile.py covering:
- Single trace: table happy path, JSON shape, --top truncation, --top
  validation, failed-step marker, missing file (exit 2), malformed
  trace (exit 1).
- Multi trace: percentile output, table aggregation, mixed flow names
  (exit 1), mismatched step counts (exit 1), consistency-warning
  surfaces.

Verification:
  $ ruff check chainweaver/ tests/ examples/      # All checks passed
  $ ruff format --check chainweaver/ tests/ ...   # 50 files already formatted
  $ python -m mypy chainweaver/ tests/            # Success: no issues
  $ python -m pytest tests/ -q --no-cov           # 491 passed in 1.86s

Stacked on top of #158 (run CLI subcommand); the base auto-retargets
to #157 → main as those merge.

https://claude.ai/code/session_01QcSJ3NWhe5B4k1EP25Hx3n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants