Conversation
… filters, and context-mode integration - Redesign Console dashboard: 8 clickable stat cards (Projects, Sessions, Active, Memories, Extensions, Requirements, Specifications, Changes), Recent Specifications and Recent Memories cards - Flatten sidebar navigation: single nav list with inline ProjectFilter dropdowns per view - Add TopbarSpecs: active spec pills in top bar across all projects - Add notification navigation with project switching and clear-all support - Add session resume workflow: copy session ID, /resume deep-linking - Memory cards: show linked session, click to navigate, remove clutter - Cross-project APIs: /api/plans/active/all, /api/extensions?all=true - Project list from project_roots table (filtered to real git repos) - Update branding: Make Claude Code production-ready - Add context-mode integration for token-optimized context management - Update README, Docusaurus docs, and marketing site BREAKING CHANGE: Console UI completely restructured — sidebar flattened, project selector removed from sidebar and replaced with per-view inline filters, dashboard cards redesigned, PlanStatus and GitStatus widgets removed
…s, background jobs, and optional Telegram
# [8.0.0](v7.11.4...v8.0.0) (2026-04-08) ### Bug Fixes * pin Bun to 1.3.11 (1.2.15 lacks compression APIs, latest is unstable) ([2372ea7](2372ea7)) ### Features * add Pilot Bot — persistent automation agent with scheduled tasks, background jobs, and optional Telegram ([cab87f6](cab87f6)) * complete Console overhaul — v8.0.0 ([#126](#126)) ([bfa261f](bfa261f)) ### BREAKING CHANGES * Console UI completely restructured — sidebar flattened, project selector removed from sidebar and replaced with per-view inline filters, dashboard cards redesigned, PlanStatus and GitStatus widgets removed * fix: remove duplicate Help slide, update hooks count to 21 * fix: pin Bun to 1.2.15 in CI (latest has execSync regression on Linux)
## [8.0.1](v8.0.0...v8.0.1) (2026-04-08) ### Bug Fixes * use correct CronCreate parameter name (cron, not schedule) in bot skills ([ad7f4af](ad7f4af))
…down, session filtering, spec tab consistency
## [8.0.2](v8.0.1...v8.0.2) (2026-04-09) ### Bug Fixes * console dashboard overhaul — 2x2 recent cards, usage model breakdown, session filtering, spec tab consistency ([3648a2d](3648a2d))
## [8.0.3](v8.0.2...v8.0.3) (2026-04-09) ### Bug Fixes * Improved Code Reviewer Stability, Codegraph Usage and Context Mode ([e78d68b](e78d68b))
## [8.0.4](v8.0.3...v8.0.4) (2026-04-09) ### Bug Fixes * Improved Dependency Installation Speed for Installer and Updater ([e426d3c](e426d3c)) * Improved PRD and Spec Sharing / Annotation UI on pilot-shell.com ([afc32f7](afc32f7))
## [8.0.5](v8.0.4...v8.0.5) (2026-04-09) ### Bug Fixes * skip CodeGraph indexing in non-git directories ([b4281df](b4281df))
## [8.0.6](v8.0.5...v8.0.6) (2026-04-09) ### Bug Fixes * walk up directory tree for git repo detection in CodeGraph guard ([f028b57](f028b57))
## [8.0.7](v8.0.6...v8.0.7) (2026-04-10) ### Bug Fixes * add tool output compression + sandboxed content cards to Usage view ([46127a1](46127a1)) * hook venv sync, console annotation writes, codegraph native sqlite ([4e9d264](4e9d264)) * restore dom globals after terminal-preview-xss test to prevent portal ssr leak ([77eda81](77eda81)) * stop incomplete child_process mocks poisoning CI test runs ([7e30074](7e30074))
…ier, fix auto-mode flag and docs - Add chrome-devtools-mcp plugin installation to installer (marketplace + parallel install) - Update browser automation from 3-tier to 4-tier: Chrome → Chrome DevTools MCP → playwright-cli → agent-browser - Update rules (browser-automation, verification, testing), skills (spec-verify, spec-bugfix-verify, spec-implement) - Update docs: Docusaurus (open-source-tools, installation, mcp-servers, rules, spec workflow), site, README - Remove deprecated --enable-auto-mode flag from wrapper (fixes user-reported launch errors) - Fix project filtering to support git repo subdirectories in Console - Update hooks documentation count and footer links
## [8.0.10](v8.0.9...v8.0.10) (2026-04-14) ### Bug Fixes * codegraph native SQLite repair, /spec new-branch option, session prompt display ([21fb7b6](21fb7b6))
…anced UI - Add SessionJsonlService to parse Claude Code JSONL session files - Extract token usage, tool breakdown, model distribution, duration, cost - Cost calculation with up-to-date pricing for Opus 4.5+/4, Sonnet, Haiku tiers - New GET /api/sessions/:id/stats and POST /api/sessions/costs endpoints - Session detail modal with stats grid, prompt, tool usage (2-col layout) - Session cards show cost and total messages instead of observations - Sortable session list columns (date, cost, messages, prompts) - Dashboard shows cost per session, cross-project navigation - 15 unit tests with JSONL fixtures
# [8.1.0](v8.0.10...v8.1.0) (2026-04-15) ### Features * add session details with JSONL stats, cost calculation, and enhanced UI ([0590783](0590783))
- Add customization packs: pilot customize install/update/remove/status CLI commands for teams on Team/Enterprise tiers to overlay custom skills, rules, hooks, and agents on top of core Pilot via a git repository. Auto-reapplied on pilot update and installer runs. Security: symlink rejection, license validation, atomic manifest writes, stale file cleanup on re-apply. - Add Opus 4.7 model support across launcher, console, and documentation. - Support custom plan statuses in Console and statusline — teams can write arbitrary Status: values in plan files and they display as-is with neutral styling instead of being hidden. - Never skip Codex adversarial review in spec-plan/spec-verify/spec-bugfix-plan — wait for the task completion notification, never read the output file to determine completion. - Docs: new customization page, updated pricing, extensions cross-ref, README.
# [8.2.0](v8.1.0...v8.2.0) (2026-04-16) ### Features * customization packs, Opus 4.7 support, and Codex bugfixes ([af711a4](af711a4))
…rtifacts untracked
## [8.2.1](v8.2.0...v8.2.1) (2026-04-17) ### Bug Fixes * customization overrides, skill decomposition polish, generated artifacts untracked ([ec1c268](ec1c268)) * Removed static effort for skills/commands to adjust based on global ([4fd9050](4fd9050))
## [8.2.2](v8.2.1...v8.2.2) (2026-04-17) ### Bug Fixes * Updated inconsistent steps in spec workflow ([a1e6118](a1e6118))
## [8.2.3](v8.2.2...v8.2.3) (2026-04-17) ### Bug Fixes * drop skill banner, normalize step heading levels across workflow skills ([05d37fa](05d37fa))
…n can't leak workers Fixes #129. ## Problem The SessionEnd hook does two side-effects: 1. POST session-complete to Console (localhost:41777) 2. Stop the bun worker-service.cjs if this is the last active session Both currently run synchronously in the hook process: - `_complete_session` calls `urllib.request.urlopen(..., timeout=5)` inline - `main` calls `subprocess.run(["bun", ...], timeout=15)` inline When Claude Code's harness cancels the SessionEnd hook (which it does after a short internal deadline), the hook process is terminated before either side-effect completes. Symptoms: - Terminal shows "SessionEnd hook [...] failed: Hook cancelled" - bun worker-service.cjs daemons accumulate as orphans across sessions - Orphans hold open file handles to ~/.pilot/memory/pilot-memory.db (WAL) but serve no traffic (only the newest is bound to :41777) Reproduced on Arch/Manjaro with Claude Code 2.x — observed three running daemons (two orphaned) after a handful of sessions. ## Fix Hand both side-effects off to detached subprocesses (`start_new_session=True`) before returning from `main`. The parent hook becomes essentially instantaneous (two `Popen` calls, no waiting), so harness cancellation no longer races the work. Ordering: worker-stop first, Console POST second. The worker-stop is the leakier side effect (holds file descriptors + network port); prioritising it means that even in a pathological case where the second `Popen` fails, the critical resource release still happens. Timeouts inside the detached workers are preserved (Console POST 5s, no explicit timeout on `bun stop` — unchanged from previous behaviour because it now runs in its own session and doesn't tie up the hook). The Console POST logic is inlined as a string constant (`_COMPLETE_SESSION_WORKER`) passed to `python -c`. This avoids creating a new file in the repo and keeps the worker self-contained; the subprocess startup cost (~30ms) is the only overhead when Console is down. ## Verification - `python -m py_compile` on the modified file: passes - `main()` with no env vars: returns 0 cleanly (short-circuits on missing CLAUDE_PLUGIN_ROOT) - Inlined worker script compiles and executes cleanly when invoked with a live Console (tested with `python3 -c <script> <url> <sid>` → POST 200) - Behavioural smoke test: run a session, exit, observe no "Hook cancelled" message and no orphan `bun worker-service.cjs --daemon` processes in `ps` - Data safety: workers are stateless HTTP fronts over shared SQLite-WAL; killing the old synchronous path's leaked workers with SIGTERM had no data-integrity impact (PRAGMA integrity_check = ok) ## Why not just increase the hook timeout? The hook timeout is controlled by the harness, not this repo. Even if it were raised, network stalls / bun startup latency / loaded-disk fsync spikes can all push a synchronous implementation past any fixed deadline. Detaching removes the class of failure entirely.
…coped annotations Bundles several independent fixes under one release. ## Vector DB disk exhaustion (closes #130) hnswlib's persisted `link_lists.bin` could grow to 11.5 TB logical / 859 GB physical, filling a 1 TB WSL disk. Adds three independent layers so the file cannot grow unbounded: - `VectorDbSizeGuard` module with both logical (`st.size`) and physical (`st.blocks * 512`) measurement so sparse-file bloat is detected. - Boot-time guard in `DatabaseManager` evicts the dir before Chroma starts (defaults: 2 GB physical / 50 GB logical, configurable via `CLAUDE_PILOT_VECTOR_DB_MAX_PHYSICAL_MB` / `CLAUDE_PILOT_VECTOR_DB_MAX_LOGICAL_MB`). - Pre-write gate in `ChromaSync.assertSafeToWrite()` — every `chroma_add_documents` / `chroma_create_collection` call checks size first (tighter caps: 1 GB physical / 10 GB logical). hnswlib never receives a write that would extend an already-oversized file. - Post-retention guard in `RetentionScheduler` re-checks after every cleanup and closes Chroma if tripped. - Auto-disable after 2 evictions within 7 days atomically flips `CLAUDE_PILOT_CHROMA_ENABLED=false` in `settings.json`. Search falls back to SQLite via existing `SearchOrchestrator` path (same escape hatch as claude-mem #991). - `/api/vector-db/health` now reports physical size, largest file, and last-eviction marker so operators can see what happened. ## Project-scoped annotations in Changes view Annotations were visible across projects — two concurrent "Changes" sessions showed each other's annotations on the same screen. Scopes `AnnotationRoutes` queries by project and filters the Changes view accordingly. ## Model routing settings UI Surface per-role model selection (opus/sonnet/haiku) in the console Settings view. Launcher `model_config.py` + `settings_injector.py` read `~/.pilot/config.json` and inject model preferences into plugin settings on startup. ## Spec workflow refinements - spec-bugfix-plan: expanded red-flags step, tightened root-cause plan + write-plan steps. - spec-bugfix-verify: behaviour-contract audit in verify-fix and plan-verify steps. - spec-implement: TDD-loop guidance updated. - docs: workflows/spec.md and features/{console,model-routing, statusline}.md updated to match. ## Statusline widgets Widget formatter updates with corresponding test coverage. ## Installer `claude_files.py` gains ~216 lines for non-destructive settings/skill merges on re-install; tests updated accordingly. ## Bundled assets `pilot/scripts/*.cjs` and `pilot/ui/*.js` regenerated from `console/src/` to match the fixes above. ## Verification - Bun: 1349 pass / 0 fail (22 new tests for `VectorDbSizeGuard`) - Python: 1574 pass / 0 fail - typecheck clean
## [8.2.4](v8.2.3...v8.2.4) (2026-04-21) ### Bug Fixes * **hooks:** make SessionEnd fully non-blocking so harness cancellation can't leak workers ([3cb965a](3cb965a)) * prevent vector-db disk exhaustion + model routing UI + project-scoped annotations ([1a066d7](1a066d7)), closes [#130](#130) [#991](https://github.com/maxritter/pilot-shell/issues/991)
…e task cards - statusline: read rate_limits from Claude Code stdin (cross-platform, no OAuth), single-percentage pacing arrow, fall back to lines+git when the field is absent (API/Enterprise users). Drop token-count suffix on the context bar. - installer: vendor installer/skill_builder.py and build SKILL.md in-process during install; no longer shells out to the pilot binary. Drift-guard test keeps it byte-identical to launcher/skill_builder.py. Bump parallel download workers 8 -> 16. - launcher/wrapper: generic capability probe caches claude --help once and appends --thinking-display summarized when supported. - launcher/codegraph: silence redundant native-SQLite repair message. - console/Topbar: spec pills now show hover cards with parsed task lists (completed + next pending) and bugfix/feature icons. - console/Changes: prefer main-branch plans over worktree plans when auto-selecting active plan so annotations match the current checkout; CodeReviewPanel filters diffs by path. - skills: prd/03-clarify skips questions already answered upstream; spec-bugfix-plan gains the git log -S pattern for token-change forensics; verify steps tightened. - settings: enable showThinkingHint. - ignore local claude-pace/ reference clone.
# [8.3.0](v8.2.4...v8.3.0) (2026-04-23) ### Features * cross-platform statusline usage, in-process skill build, console task cards ([9f394be](9f394be))
Primary feature: /benchmark — quantitative before/after measurement for Claude Code skills and rules. Isolated per-run sandboxes, crash-proof global-rule auto-hiding, parallel workers, and an LLM grader that emits structured per-assertion verdicts plus improvement suggestions. - pilot/skills/benchmark/: orchestrator + 6 step files (intake → target discovery → author evals → execute → present findings → improvement plan), runner.py with parallel workers and per-run filesystem isolation, aggregate_benchmark.py, isolation.py, progress.py, utils.py, agents/grader.md, and ~1800 LOC of unit test coverage - docs/docusaurus/docs/workflows/benchmark.md: workflow documentation Also in this commit: - /prd skill: new 02b-ideate step, orchestrator and 8 steps updated - console: pilot-spawn utility + tests, license routes tests, user-message handler fixes, worker-service refactor, Dashboard/Requirements/Topbar viewer updates - installer: finalize step + claude_files test updates - launcher: banner and statusline formatter updates - CI/release: release-dev and release workflow tweaks, pre-commit hook - docs/site: ConsoleSection, InstallSection, WorkflowSteps, PRD and spec workflow pages, intro, statusline feature page - misc: README, .gitignore, pyproject.toml
# [8.4.0](v8.3.0...v8.4.0) (2026-04-24) ### Features * add /benchmark skill and evaluation framework ([843ef85](843ef85))
console/usage:
- Replace external ccusage CLI with native Claude session analytics
- New pricing engine (LiteLLM fetch + bundled fallback + model aliases)
filters out <synthetic>/unknown internal markers
- New /api/usage/{daily,monthly,models,yield,plan} endpoints; cross-view
parity ensures Sessions and Usage agree on cost
- Restructured Usage view: equal-height pairs (Cost-by-Model + Routing,
Code-Quality + Tool-Output-Compression), Plan progress below
- Cost-by-Model groups by family (Opus/Sonnet/Haiku) with distinct colors
- Code-Quality card: Day/Week/Month toggle, commits-shipped headline,
cost-per-shipped-commit, productive-spend %, degraded-state surfacing
- Plan tracking via SettingsRoutes (Pro/Max/Custom + month-end forecast)
- Yield uses execFile + per-call timeouts + ref validation + HEAD-sha cache
ccusage scrub:
- Drop install_ccusage from installer + tests, uninstall hint, and the
open-source-tools docs row; codeburn/ source repo deleted
pilot/hooks:
- New tool_redirect hook (with tests) to redirect deferred sub-agent flows
pilot/rules:
- New documentation-sync rule; tweaks to development-practices, testing,
and verification
pilot/skills:
- benchmark: runner/utils tightening, review-iterate + improvement-plan
step rewrites, new test coverage
- setup-rules: split out new sync-agents step, renumber summary 11 -> 12
docs:
- Refreshed benchmark, setup-rules, statusline, rules, intro pages
- Site SEO/sitemap/indexnow plumbing, hero copy, Schema.org logo fix
- Stddev/pass-rate disambiguation and removal of phantom Step 6 refs in
benchmark.md (CodeRabbit findings)
## [8.4.1](v8.4.0...v8.4.1) (2026-04-27) ### Bug Fixes * native usage analytics, ccusage scrub, hook + rules + site updates ([c7cd604](c7cd604))
Adds /fix as the standard bugfix command — investigate, RED test, fix, audit, done — with mandatory end-to-end verification. Slims the existing /spec bugfix lane substantially. The workflow change addresses the issue where two simple bugs took 21+ minutes and 400k tokens halfway through. ## /fix — new user-invocable command - New skill at pilot/skills/fix/ with six steps (investigate → RED → fix → audit → quality → finalise). Always quick: stops cleanly and tells the user to re-invoke with /spec when complexity exceeds the quick lane. - Iron Law #4: never claim done from unit tests alone. Step 4.3 is mandatory end-to-end verification — UI bugs must run browser automation via Claude Code Chrome / Chrome DevTools MCP / playwright-cli / agent-browser; CLI/API/library bugs must be exercised with real invocations and concrete evidence captured in the completion report. - Step 1.4 adds explicit Instrument-when-needed guidance with SPEC-DEBUG markers (cleanup grep enforced at audit). - Registered as user-invocable, model: opus. ## Slimmed full bugfix lane (/spec) - Dropped Codex from spec-bugfix-plan entirely (was step 7) — adversarial review on a plan with no code yet was overhead. Codex still runs at spec-verify time. - Codex-once rule across spec-plan + spec-verify: a session-scoped sentinel file prevents re-runs across plan iterations within one /spec invocation (was re-running on every annotate/verify cycle). - spec-bugfix-verify: 11 steps → 8. Dropped redundant plan-verify (no-op), runtime-verification (foldable into quality), post-merge-verify (squash merges with no conflicts are safe). - spec-bugfix-verify Step 2 (Behavior Contract audit): tightened from ~6.4KB to ~3.8KB. Sub-steps 2.5/2.6 merged into 2.4. Step 2.6 now enforces mandatory E2E verification with concrete evidence — no claim-VERIFIED-on-tests-alone. - spec-bugfix-plan: tightened red-flags step with Common Rationalizations and User Signals tables (inspired by superpowers/systematic-debugging). Step 3.3 adds a concrete multi-layer instrumentation example. Step 4 fix-approach-selection defaults to picking the obvious fix rather than manufacturing fake alternatives. - spec-implement Task 2 (bugfix lane): drops the per-task full-suite run. For bundled bugfix plans this was the single biggest token sink — full suite now runs once at the Quality Gate, targeted module test runs between fix iterations. ## Iteration cap - spec-bugfix-verify Step 8 caps fix iterations at 3. After three failed verify cycles, AskUserQuestion presents Continue/Pivot/Abandon — stops the fix-on-fix-on-fix death spiral that previously had no bound. ## Dispatcher / model defaults / Console - /spec dispatcher restored to clean split: /spec routes bugs to spec-bugfix-plan (full lane), /fix is a separate user-facing command for the quick lane. No --full flag, no auto-routing between commands. - launcher/model_config.py: /fix and /benchmark added to default skill models (opus). Console useSettings.ts mirrors. - Console Settings page: General Rows now lists /fix and /benchmark with descriptive labels + sub-text. SPEC_PHASE_ROWS rendered with sub-text. - launcher/banner.py + installer/steps/finalize.py: /fix and /benchmark added to quick-start tips and post-install workflow list. Spec-Driven feature description updated to mention both /spec and /fix. - pilot/settings.json: companyAnnouncements line now lists /fix; spinner tip updated. ## Docs - New Docusaurus page docs/workflows/fix.md with workflow diagram, Common Issues table, mandatory E2E section, /spec vs /fix decision table. Sidebars updated. - /spec page trimmed of bugfix-only content (now points at /fix). - README: new /fix section with How /fix works + How /spec handles bugs details blocks. - Marketing site WhatsInside.tsx updated to mention /fix. ## Other-session changes incorporated - Console: parsePlanContent.ts + tests for spec section rendering and plan annotator persistence (Bug A + Bug B fixes). - pilot/hooks/tool_redirect.py + tests: extended. - pilot/.mcp.json + hooks/hooks.json + scripts/*.cjs: minor updates. - pilot/rules/development-practices.md + mcp-servers.md: updates. - installer/downloads.py: minor. - pilot/ui/* compiled bundles: rebuilt. - .devcontainer/*: minor adjustments.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
Caution Review failedPull request was closed or merged during review WalkthroughThis PR represents version 8.4.1, repositioning Pilot Shell as a "Claude Code Engineering Platform" with major documentation overhaul, new workflows ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
Summary
Adds /fix as the standard bugfix command — investigate, RED test, fix, audit, done — with mandatory end-to-end verification. Slims the existing /spec bugfix lane substantially. Addresses the issue where two simple bugs took 21+ minutes and 400k tokens halfway through.
/fix — new user-invocable command
pilot/skills/fix/with six steps (investigate → RED → fix → audit → quality → finalise)--fullflag, no auto-routing between commands — honour the user's command choice)SPEC-DEBUG:markers (cleanup grep enforced at audit)Slimmed full bugfix lane (/spec)
Iteration cap
spec-bugfix-verify Step 8 caps fix iterations at 3. After three failed verify cycles, AskUserQuestion presents Continue/Pivot/Abandon — stops the fix-on-fix-on-fix death spiral.
Dispatcher / model defaults / Console
Docs
docs/workflows/fix.mdwith workflow diagram, Common Issues table, mandatory E2E section, /spec vs /fix decision tableOther-session changes incorporated
parsePlanContent.ts+ tests for spec section rendering and plan annotator persistence (Bug A + Bug B fixes from a parallel session)Verification
tsc --noEmitcleanruff checkclean on all files I touchedSummary by CodeRabbit
Release Notes
New Features
/benchmarkworkflow: Quantitative before/after evaluation with graded assertions./fixworkflow: Streamlined bugfix process with RED→fix→verify discipline.Improvements