chore(toolpath-desktop): add synthetic preview benchmarks by eliothedeman · Pull Request #47 · empathic/toolpath

eliothedeman · 2026-04-22T20:38:38Z

Summary

Adds gen_synthetic_path binary in toolpath-cli that emits a synthetic Path JSON at configurable step counts (default --steps 1000), with a realistic ~70% text turns / ~20% Edit or Write / ~10% MultiEdit mix. Deterministic seed.
Adds crates/toolpath-desktop/frontend/src/lib/__bench__/preview.bench.ts, runnable as bun run bench, covering the Preview's pure-TS hot paths: normalize, buildTree, flattenChatHead, classify, matchesFilter (keystroke simulation).
Adds crates/toolpath-desktop/frontend/BENCHMARKS.md — describes how to generate fixtures, run the bench, and the manual Chrome DevTools procedure for render-time + memory (the parts a Tauri webview dominates and an agent can't script). Includes a 2026-04-22 baseline table with real pure-TS numbers and empty rows for the manual measurements.
Bumps toolpath-cli 0.3.1 → 0.4.0 (additive public change: new binary). Workspace root Cargo.toml does not list toolpath-cli as a workspace dep, so only Cargo.toml + site/_data/crates.json + CHANGELOG.md needed updates per CLAUDE.md's checklist.
.gitignores /bench/fixtures/ — fixtures are regenerated locally, not committed (~5 MB at 10k steps).

Addresses #41. Not Closes — the issue asks for a rerun after #38 and #39 land, so it stays open as a tracker.

Baseline numbers (Apple M4 Pro, bun 1.3.5)

Pure-TS, 10 iterations each:

Size	JSON.parse	normalize	buildTree (median)	buildTree (p95)	keystroke filter	flattenChatHead	classify × all
1k	1.16 ms	0.23 ms	3.98 ms	7.5 ms	0.08 ms	0.23 ms	0.14 ms
5k	3.17 ms	0.79 ms	82.2 ms	113 ms	0.43 ms	1.20 ms	0.47 ms
10k	6.32 ms	2.15 ms	579 ms	1830 ms	1.12 ms	5.34 ms	1.49 ms

buildTree at 10k is well above the 200 ms keystroke budget from the issue. Expected — this is the thing #39 is meant to address.

Tauri webview measurements (TFP, keystroke DOM-updated, memory delta) are left blank in BENCHMARKS.md with a manual procedure — those need a human at a running cargo tauri dev session with DevTools open.

Test plan

eliothedeman

Summary

Solid execution of the benchmarking brief: a working fixture generator, a pure-TS bench that actually produces comparable numbers, and a BENCHMARKS.md that is honest about what was measured vs. left for manual. Scope stays clean — no perf fixes smuggled in, no svelte-virtual-list, no speculative sub-issues filed.

Notes

crates/toolpath-cli/src/bin/gen_synthetic_path.rs:293 — real tool, deterministic seed, clap CLI, writes valid Path docs (PR confirms path validate passes). Mix (~70/20/10) matches the ask.
.gitignore:5 — fixtures correctly gitignored under /bench/fixtures/, not checked in. Good.
crates/toolpath-cli/Cargo.toml:51 — default-run = "path" preserves cargo run -p toolpath-cli behavior; second [[bin]] added cleanly.
crates/toolpath-desktop/frontend/src/lib/__bench__/preview.bench.ts:574 — plain performance.now() + median/p95/max over 10 iterations. No Vitest dep added, which fits the minimal-surface-area vibe. KEYSTROKE_QUERIES cycles realistic prefixes.
BENCHMARKS.md:489 — baseline table has real numbers for pure-TS ops; Tauri rows left as — with an explicit manual procedure. Matches the PR body's honesty claim.
BENCHMARKS.md:509 — narrative correctly attributes the 10k buildTree regression (579 ms median, 1.8 s p95) to #39 territory and flags renderMarkdown as the likely keystroke culprit for #38, without doing the fix here.
BENCHMARKS.md:551 — explicit "don't file sub-issues for known wins from #38/#39" — respects the scope-creep guardrail from the issue.
Version bumps: toolpath-cli 0.3.1 → 0.4.0 in Cargo.toml, Cargo.lock, CHANGELOG.md:19, site/_data/crates.json:84. Workspace root [workspace.dependencies] does not list toolpath-cli (confirmed — it's a leaf binary crate), so that's correctly skipped. Checklist followed.
tsconfig.json:765 excludes __bench__/** from svelte-check — sensible; the bench uses Node APIs.
Minor: bench/fixtures/synthetic-10k.path.json TFP row in the manual procedure (step 3) candidly admits the paste-into-DevTools route is awkward and recommends using a real long session instead. Honest, slightly hand-wavy — fine as a starting procedure.
No CI checks reported on the branch (gh pr checks 47 → "no checks reported"), but the PR body lists the local verification matrix.

Verdict

Approve. Delivers exactly the benchmarking scaffolding the issue asked for, with no scope creep and honest accounting of measured vs. pending numbers. Safe to merge; issue stays open as the rerun tracker post-#38/#39.

github-actions · 2026-04-22T21:06:32Z

🔍 Preview deployed: https://25cf3266.toolpath.pages.dev

Adds a fixture generator (new `gen_synthetic_path` binary in toolpath-cli) and a pure-TS bench script covering the Preview's hot paths: `normalize`, `buildTree`, `flattenChatHead`, `classify`, `matchesFilter`. `bun run bench` reports median/p95/max over 10 iterations on fixtures at 1k / 5k / 10k steps. BENCHMARKS.md captures the 2026-04-22 baseline on Apple M4 Pro. Notable: `buildTree` at 10k steps is ~579 ms median (p95 1.8 s) — well over the 200 ms keystroke budget, and the primary thing #39 should improve. Manual Tauri webview procedure (render time, memory) is documented with an empty template for a human to fill after a DevTools session. Bumps toolpath-cli to 0.4.0 (additive public change: new binary). Fixture files are gitignored — regenerate locally. Addresses #41

eliothedeman commented Apr 22, 2026

View reviewed changes

eliothedeman force-pushed the eliot/issue-41-preview-benchmark branch from bf9cb74 to 85b3525 Compare April 22, 2026 21:03

eliothedeman force-pushed the eliot/issue-41-preview-benchmark branch from 85b3525 to cea63c9 Compare April 22, 2026 21:07

eliothedeman merged commit 2b99bcd into main Apr 22, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(toolpath-desktop): add synthetic preview benchmarks#47

chore(toolpath-desktop): add synthetic preview benchmarks#47
eliothedeman merged 1 commit intomainfrom
eliot/issue-41-preview-benchmark

eliothedeman commented Apr 22, 2026

Uh oh!

eliothedeman left a comment

Uh oh!

github-actions Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eliothedeman commented Apr 22, 2026

Summary

Baseline numbers (Apple M4 Pro, bun 1.3.5)

Test plan

Uh oh!

eliothedeman left a comment

Choose a reason for hiding this comment

Summary

Notes

Verdict

Uh oh!

github-actions Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 22, 2026 •

edited

Loading