chore(toolpath-desktop): add synthetic preview benchmarks#47
Merged
eliothedeman merged 1 commit intomainfrom Apr 22, 2026
Merged
chore(toolpath-desktop): add synthetic preview benchmarks#47eliothedeman merged 1 commit intomainfrom
eliothedeman merged 1 commit intomainfrom
Conversation
eliothedeman
commented
Apr 22, 2026
Collaborator
Author
eliothedeman
left a comment
There was a problem hiding this comment.
Summary
Solid execution of the benchmarking brief: a working fixture generator, a pure-TS bench that actually produces comparable numbers, and a BENCHMARKS.md that is honest about what was measured vs. left for manual. Scope stays clean — no perf fixes smuggled in, no svelte-virtual-list, no speculative sub-issues filed.
Notes
crates/toolpath-cli/src/bin/gen_synthetic_path.rs:293— real tool, deterministic seed, clap CLI, writes validPathdocs (PR confirmspath validatepasses). Mix (~70/20/10) matches the ask..gitignore:5— fixtures correctly gitignored under/bench/fixtures/, not checked in. Good.crates/toolpath-cli/Cargo.toml:51—default-run = "path"preservescargo run -p toolpath-clibehavior; second[[bin]]added cleanly.crates/toolpath-desktop/frontend/src/lib/__bench__/preview.bench.ts:574— plainperformance.now()+ median/p95/max over 10 iterations. No Vitest dep added, which fits the minimal-surface-area vibe.KEYSTROKE_QUERIEScycles realistic prefixes.BENCHMARKS.md:489— baseline table has real numbers for pure-TS ops; Tauri rows left as—with an explicit manual procedure. Matches the PR body's honesty claim.BENCHMARKS.md:509— narrative correctly attributes the 10kbuildTreeregression (579 ms median, 1.8 s p95) to #39 territory and flagsrenderMarkdownas the likely keystroke culprit for #38, without doing the fix here.BENCHMARKS.md:551— explicit "don't file sub-issues for known wins from #38/#39" — respects the scope-creep guardrail from the issue.- Version bumps:
toolpath-cli0.3.1 → 0.4.0 inCargo.toml,Cargo.lock,CHANGELOG.md:19,site/_data/crates.json:84. Workspace root[workspace.dependencies]does not listtoolpath-cli(confirmed — it's a leaf binary crate), so that's correctly skipped. Checklist followed. tsconfig.json:765excludes__bench__/**from svelte-check — sensible; the bench uses Node APIs.- Minor:
bench/fixtures/synthetic-10k.path.jsonTFP row in the manual procedure (step 3) candidly admits the paste-into-DevTools route is awkward and recommends using a real long session instead. Honest, slightly hand-wavy — fine as a starting procedure. - No CI checks reported on the branch (
gh pr checks 47→ "no checks reported"), but the PR body lists the local verification matrix.
Verdict
Approve. Delivers exactly the benchmarking scaffolding the issue asked for, with no scope creep and honest accounting of measured vs. pending numbers. Safe to merge; issue stays open as the rerun tracker post-#38/#39.
bf9cb74 to
85b3525
Compare
|
🔍 Preview deployed: https://25cf3266.toolpath.pages.dev |
Adds a fixture generator (new `gen_synthetic_path` binary in toolpath-cli) and a pure-TS bench script covering the Preview's hot paths: `normalize`, `buildTree`, `flattenChatHead`, `classify`, `matchesFilter`. `bun run bench` reports median/p95/max over 10 iterations on fixtures at 1k / 5k / 10k steps. BENCHMARKS.md captures the 2026-04-22 baseline on Apple M4 Pro. Notable: `buildTree` at 10k steps is ~579 ms median (p95 1.8 s) — well over the 200 ms keystroke budget, and the primary thing #39 should improve. Manual Tauri webview procedure (render time, memory) is documented with an empty template for a human to fill after a DevTools session. Bumps toolpath-cli to 0.4.0 (additive public change: new binary). Fixture files are gitignored — regenerate locally. Addresses #41
85b3525 to
cea63c9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gen_synthetic_pathbinary intoolpath-clithat emits a syntheticPathJSON at configurable step counts (default--steps 1000), with a realistic ~70% text turns / ~20% Edit or Write / ~10% MultiEdit mix. Deterministic seed.crates/toolpath-desktop/frontend/src/lib/__bench__/preview.bench.ts, runnable asbun run bench, covering the Preview's pure-TS hot paths:normalize,buildTree,flattenChatHead,classify,matchesFilter(keystroke simulation).crates/toolpath-desktop/frontend/BENCHMARKS.md— describes how to generate fixtures, run the bench, and the manual Chrome DevTools procedure for render-time + memory (the parts a Tauri webview dominates and an agent can't script). Includes a 2026-04-22 baseline table with real pure-TS numbers and empty rows for the manual measurements.toolpath-cli0.3.1 → 0.4.0 (additive public change: new binary). Workspace root Cargo.toml does not list toolpath-cli as a workspace dep, so only Cargo.toml + site/_data/crates.json + CHANGELOG.md needed updates perCLAUDE.md's checklist..gitignores/bench/fixtures/— fixtures are regenerated locally, not committed (~5 MB at 10k steps).Addresses #41. Not
Closes— the issue asks for a rerun after #38 and #39 land, so it stays open as a tracker.Baseline numbers (Apple M4 Pro, bun 1.3.5)
Pure-TS, 10 iterations each:
buildTreeat 10k is well above the 200 ms keystroke budget from the issue. Expected — this is the thing #39 is meant to address.Tauri webview measurements (TFP, keystroke DOM-updated, memory delta) are left blank in
BENCHMARKS.mdwith a manual procedure — those need a human at a runningcargo tauri devsession with DevTools open.Test plan
cargo build -p toolpath-cli --bin gen_synthetic_pathcargo run -p toolpath-cli --bin gen_synthetic_path -- --steps 1000 --out bench/fixtures/synthetic-1k.path.json(and 5k / 10k)cargo run -p toolpath-cli --bin path -- validate --input bench/fixtures/synthetic-1k.path.json(passes)cd crates/toolpath-desktop/frontend && bun install && bun run bench(produces the numbers above)bun run check(0 errors, 4 pre-existing warnings)bun run build(clean)cargo clippy -p toolpath-cli --bin gen_synthetic_path -- -D warnings(clean)cargo test -p toolpath-cli --bin path(152 pass, 0 fail)BENCHMARKS.mdbun run benchafter toolpath-desktop: memoize markdown rendering per chat turn #38 and toolpath-desktop: buildTree re-normalizes on any preview mutation #39 merge; append the comparison rows