Skip to content

feat: bugfix workflow redesign with new /fix command#135

Closed
maxritter wants to merge 60 commits intomainfrom
dev
Closed

feat: bugfix workflow redesign with new /fix command#135
maxritter wants to merge 60 commits intomainfrom
dev

Conversation

@maxritter
Copy link
Copy Markdown
Owner

@maxritter maxritter commented Apr 29, 2026

Summary

Adds /fix as the standard bugfix command — investigate, RED test, fix, audit, done — with mandatory end-to-end verification. Slims the existing /spec bugfix lane substantially. Addresses the issue where two simple bugs took 21+ minutes and 400k tokens halfway through.

/fix — new user-invocable command

  • New skill at pilot/skills/fix/ with six steps (investigate → RED → fix → audit → quality → finalise)
  • Always quick: stops cleanly and tells the user to re-invoke with /spec when complexity exceeds the quick lane (no --full flag, no auto-routing between commands — honour the user's command choice)
  • Iron Law docs: reorder setup with less context switching #4: never claim done from unit tests alone. Step 4.3 mandates running the actual program with concrete evidence
    • UI: Claude Code Chrome → Chrome DevTools MCP → playwright-cli → agent-browser
    • CLI / API / library / background job: real invocation with captured output
  • Step 1.4 adds explicit Instrument-when-needed guidance with SPEC-DEBUG: markers (cleanup grep enforced at audit)
  • Registered as user-invocable, model: opus

Slimmed full bugfix lane (/spec)

  • Codex dropped from spec-bugfix-plan (was step 7) — adversarial review on a plan with no code yet was overhead. Codex still runs at spec-verify time.
  • Codex-once rule across spec-plan + spec-verify: a session-scoped sentinel file prevents re-runs across plan iterations within one /spec invocation
  • spec-bugfix-verify: 11 → 8 steps. Dropped redundant plan-verify, runtime-verification, post-merge-verify
  • Step 2 (Behavior Contract audit): tightened from ~6.4KB to ~3.8KB. Step 2.6 now enforces mandatory E2E verification with concrete evidence — no claim-VERIFIED-on-tests-alone
  • spec-bugfix-plan: tightened red-flags step with Common Rationalizations and User Signals tables (from superpowers/systematic-debugging). Step 3.3 adds concrete multi-layer instrumentation example. Step 4 fix-approach defaults to picking the obvious fix.
  • spec-implement Task 2 (bugfix lane): drops the per-task full-suite run — single biggest token sink in bundled bugfix plans. Full suite now runs once at the Quality Gate.

Iteration cap

spec-bugfix-verify Step 8 caps fix iterations at 3. After three failed verify cycles, AskUserQuestion presents Continue/Pivot/Abandon — stops the fix-on-fix-on-fix death spiral.

Dispatcher / model defaults / Console

  • /spec dispatcher: clean split — /spec routes bugs to spec-bugfix-plan, /fix is a separate command for the quick lane
  • launcher/model_config.py: /fix and /benchmark added to default skill models (opus). Console useSettings.ts mirrors.
  • Console Settings page: General Rows now lists /fix and /benchmark with descriptive labels + sub-text. SPEC_PHASE_ROWS rendered with sub-text.
  • launcher/banner.py + installer/steps/finalize.py: /fix and /benchmark added to quick-start tips and post-install workflow list
  • pilot/settings.json: companyAnnouncements line lists /fix; spinner tip updated

Docs

  • New Docusaurus page docs/workflows/fix.md with workflow diagram, Common Issues table, mandatory E2E section, /spec vs /fix decision table
  • /spec page trimmed of bugfix-only content (now points at /fix)
  • README: new /fix section with How /fix works + How /spec handles bugs details blocks
  • Marketing site WhatsInside.tsx updated to mention /fix

Other-session changes incorporated

  • Console: parsePlanContent.ts + tests for spec section rendering and plan annotator persistence (Bug A + Bug B fixes from a parallel session)
  • pilot/hooks/tool_redirect.py + tests
  • pilot/.mcp.json + hooks/hooks.json + scripts/*.cjs minor updates
  • pilot/rules/development-practices.md + mcp-servers.md updates
  • installer/downloads.py minor
  • pilot/ui/* compiled bundles rebuilt
  • .devcontainer/* tweaks

Verification

  • ✅ 1690 Python tests pass (launcher + hooks + installer)
  • ✅ Console: 1438 tests pass + tsc --noEmit clean
  • ruff check clean on all files I touched

Summary by CodeRabbit

Release Notes

  • New Features

    • Pilot Bot: 24/7 automation agent with scheduled job execution.
    • /benchmark workflow: Quantitative before/after evaluation with graded assertions.
    • /fix workflow: Streamlined bugfix process with RED→fix→verify discipline.
    • Customization framework: Team-wide Pilot Shell workflow and configuration overrides.
    • Chrome DevTools MCP: Enhanced browser automation as enterprise fallback.
    • Context-mode: Sandbox routing for large outputs with significant token savings.
  • Improvements

    • Resilient download retries with targeted HTTP 5xx backoff.
    • Native SQLite support and Claude CLI plugin marketplace integration.
    • Refined skill artifact handling and generation.

maxritter and others added 30 commits April 8, 2026 13:01
… filters, and context-mode integration

- Redesign Console dashboard: 8 clickable stat cards (Projects, Sessions, Active, Memories, Extensions, Requirements, Specifications, Changes), Recent Specifications and Recent Memories cards
- Flatten sidebar navigation: single nav list with inline ProjectFilter dropdowns per view
- Add TopbarSpecs: active spec pills in top bar across all projects
- Add notification navigation with project switching and clear-all support
- Add session resume workflow: copy session ID, /resume deep-linking
- Memory cards: show linked session, click to navigate, remove clutter
- Cross-project APIs: /api/plans/active/all, /api/extensions?all=true
- Project list from project_roots table (filtered to real git repos)
- Update branding: Make Claude Code production-ready
- Add context-mode integration for token-optimized context management
- Update README, Docusaurus docs, and marketing site

BREAKING CHANGE: Console UI completely restructured — sidebar flattened, project selector removed from sidebar and replaced with per-view inline filters, dashboard cards redesigned, PlanStatus and GitStatus widgets removed
# [8.0.0](v7.11.4...v8.0.0) (2026-04-08)

### Bug Fixes

* pin Bun to 1.3.11 (1.2.15 lacks compression APIs, latest is unstable) ([2372ea7](2372ea7))

### Features

* add Pilot Bot — persistent automation agent with scheduled tasks, background jobs, and optional Telegram ([cab87f6](cab87f6))
* complete Console overhaul — v8.0.0 ([#126](#126)) ([bfa261f](bfa261f))

### BREAKING CHANGES

* Console UI completely restructured — sidebar flattened, project selector removed from sidebar and replaced with per-view inline filters, dashboard cards redesigned, PlanStatus and GitStatus widgets removed

* fix: remove duplicate Help slide, update hooks count to 21

* fix: pin Bun to 1.2.15 in CI (latest has execSync regression on Linux)
## [8.0.1](v8.0.0...v8.0.1) (2026-04-08)

### Bug Fixes

* use correct CronCreate parameter name (cron, not schedule) in bot skills ([ad7f4af](ad7f4af))
…down, session filtering, spec tab consistency
## [8.0.2](v8.0.1...v8.0.2) (2026-04-09)

### Bug Fixes

* console dashboard overhaul — 2x2 recent cards, usage model breakdown, session filtering, spec tab consistency ([3648a2d](3648a2d))
## [8.0.3](v8.0.2...v8.0.3) (2026-04-09)

### Bug Fixes

* Improved Code Reviewer Stability, Codegraph Usage and Context Mode ([e78d68b](e78d68b))
## [8.0.4](v8.0.3...v8.0.4) (2026-04-09)

### Bug Fixes

* Improved Dependency Installation Speed for Installer and Updater ([e426d3c](e426d3c))
* Improved PRD and Spec Sharing / Annotation UI on pilot-shell.com ([afc32f7](afc32f7))
## [8.0.5](v8.0.4...v8.0.5) (2026-04-09)

### Bug Fixes

* skip CodeGraph indexing in non-git directories ([b4281df](b4281df))
## [8.0.6](v8.0.5...v8.0.6) (2026-04-09)

### Bug Fixes

* walk up directory tree for git repo detection in CodeGraph guard ([f028b57](f028b57))
## [8.0.7](v8.0.6...v8.0.7) (2026-04-10)

### Bug Fixes

* add tool output compression + sandboxed content cards to Usage view ([46127a1](46127a1))
* hook venv sync, console annotation writes, codegraph native sqlite ([4e9d264](4e9d264))
* restore dom globals after terminal-preview-xss test to prevent portal ssr leak ([77eda81](77eda81))
* stop incomplete child_process mocks poisoning CI test runs ([7e30074](7e30074))
…ier, fix auto-mode flag and docs

- Add chrome-devtools-mcp plugin installation to installer (marketplace + parallel install)
- Update browser automation from 3-tier to 4-tier: Chrome → Chrome DevTools MCP → playwright-cli → agent-browser
- Update rules (browser-automation, verification, testing), skills (spec-verify, spec-bugfix-verify, spec-implement)
- Update docs: Docusaurus (open-source-tools, installation, mcp-servers, rules, spec workflow), site, README
- Remove deprecated --enable-auto-mode flag from wrapper (fixes user-reported launch errors)
- Fix project filtering to support git repo subdirectories in Console
- Update hooks documentation count and footer links
semantic-release-bot and others added 26 commits April 29, 2026 10:13
## [8.0.10](v8.0.9...v8.0.10) (2026-04-14)

### Bug Fixes

* codegraph native SQLite repair, /spec new-branch option, session prompt display ([21fb7b6](21fb7b6))
…anced UI

- Add SessionJsonlService to parse Claude Code JSONL session files
- Extract token usage, tool breakdown, model distribution, duration, cost
- Cost calculation with up-to-date pricing for Opus 4.5+/4, Sonnet, Haiku tiers
- New GET /api/sessions/:id/stats and POST /api/sessions/costs endpoints
- Session detail modal with stats grid, prompt, tool usage (2-col layout)
- Session cards show cost and total messages instead of observations
- Sortable session list columns (date, cost, messages, prompts)
- Dashboard shows cost per session, cross-project navigation
- 15 unit tests with JSONL fixtures
# [8.1.0](v8.0.10...v8.1.0) (2026-04-15)

### Features

* add session details with JSONL stats, cost calculation, and enhanced UI ([0590783](0590783))
- Add customization packs: pilot customize install/update/remove/status CLI commands
  for teams on Team/Enterprise tiers to overlay custom skills, rules, hooks, and
  agents on top of core Pilot via a git repository. Auto-reapplied on pilot update
  and installer runs. Security: symlink rejection, license validation, atomic
  manifest writes, stale file cleanup on re-apply.
- Add Opus 4.7 model support across launcher, console, and documentation.
- Support custom plan statuses in Console and statusline — teams can write
  arbitrary Status: values in plan files and they display as-is with neutral
  styling instead of being hidden.
- Never skip Codex adversarial review in spec-plan/spec-verify/spec-bugfix-plan
  — wait for the task completion notification, never read the output file to
  determine completion.
- Docs: new customization page, updated pricing, extensions cross-ref, README.
# [8.2.0](v8.1.0...v8.2.0) (2026-04-16)

### Features

* customization packs, Opus 4.7 support, and Codex bugfixes ([af711a4](af711a4))
## [8.2.1](v8.2.0...v8.2.1) (2026-04-17)

### Bug Fixes

* customization overrides, skill decomposition polish, generated artifacts untracked ([ec1c268](ec1c268))
* Removed static effort for skills/commands to adjust based on global ([4fd9050](4fd9050))
## [8.2.2](v8.2.1...v8.2.2) (2026-04-17)

### Bug Fixes

* Updated inconsistent steps in spec workflow ([a1e6118](a1e6118))
## [8.2.3](v8.2.2...v8.2.3) (2026-04-17)

### Bug Fixes

* drop skill banner, normalize step heading levels across workflow skills ([05d37fa](05d37fa))
…n can't leak workers

Fixes #129.

## Problem

The SessionEnd hook does two side-effects:
1. POST session-complete to Console (localhost:41777)
2. Stop the bun worker-service.cjs if this is the last active session

Both currently run synchronously in the hook process:
- `_complete_session` calls `urllib.request.urlopen(..., timeout=5)` inline
- `main` calls `subprocess.run(["bun", ...], timeout=15)` inline

When Claude Code's harness cancels the SessionEnd hook (which it does after
a short internal deadline), the hook process is terminated before either
side-effect completes. Symptoms:

- Terminal shows "SessionEnd hook [...] failed: Hook cancelled"
- bun worker-service.cjs daemons accumulate as orphans across sessions
- Orphans hold open file handles to ~/.pilot/memory/pilot-memory.db (WAL)
  but serve no traffic (only the newest is bound to :41777)

Reproduced on Arch/Manjaro with Claude Code 2.x — observed three running
daemons (two orphaned) after a handful of sessions.

## Fix

Hand both side-effects off to detached subprocesses (`start_new_session=True`)
before returning from `main`. The parent hook becomes essentially instantaneous
(two `Popen` calls, no waiting), so harness cancellation no longer races the
work.

Ordering: worker-stop first, Console POST second. The worker-stop is the
leakier side effect (holds file descriptors + network port); prioritising
it means that even in a pathological case where the second `Popen` fails,
the critical resource release still happens.

Timeouts inside the detached workers are preserved (Console POST 5s,
no explicit timeout on `bun stop` — unchanged from previous behaviour
because it now runs in its own session and doesn't tie up the hook).

The Console POST logic is inlined as a string constant
(`_COMPLETE_SESSION_WORKER`) passed to `python -c`. This avoids creating
a new file in the repo and keeps the worker self-contained; the
subprocess startup cost (~30ms) is the only overhead when Console is down.

## Verification

- `python -m py_compile` on the modified file: passes
- `main()` with no env vars: returns 0 cleanly (short-circuits on missing
  CLAUDE_PLUGIN_ROOT)
- Inlined worker script compiles and executes cleanly when invoked with
  a live Console (tested with `python3 -c <script> <url> <sid>` → POST 200)
- Behavioural smoke test: run a session, exit, observe no
  "Hook cancelled" message and no orphan `bun worker-service.cjs --daemon`
  processes in `ps`
- Data safety: workers are stateless HTTP fronts over shared SQLite-WAL;
  killing the old synchronous path's leaked workers with SIGTERM had no
  data-integrity impact (PRAGMA integrity_check = ok)

## Why not just increase the hook timeout?

The hook timeout is controlled by the harness, not this repo. Even if it
were raised, network stalls / bun startup latency / loaded-disk fsync
spikes can all push a synchronous implementation past any fixed deadline.
Detaching removes the class of failure entirely.
…coped annotations

Bundles several independent fixes under one release.

## Vector DB disk exhaustion (closes #130)

hnswlib's persisted `link_lists.bin` could grow to 11.5 TB logical /
859 GB physical, filling a 1 TB WSL disk. Adds three independent layers
so the file cannot grow unbounded:

- `VectorDbSizeGuard` module with both logical (`st.size`) and
  physical (`st.blocks * 512`) measurement so sparse-file bloat is
  detected.
- Boot-time guard in `DatabaseManager` evicts the dir before Chroma
  starts (defaults: 2 GB physical / 50 GB logical, configurable via
  `CLAUDE_PILOT_VECTOR_DB_MAX_PHYSICAL_MB` /
  `CLAUDE_PILOT_VECTOR_DB_MAX_LOGICAL_MB`).
- Pre-write gate in `ChromaSync.assertSafeToWrite()` — every
  `chroma_add_documents` / `chroma_create_collection` call checks
  size first (tighter caps: 1 GB physical / 10 GB logical). hnswlib
  never receives a write that would extend an already-oversized file.
- Post-retention guard in `RetentionScheduler` re-checks after every
  cleanup and closes Chroma if tripped.
- Auto-disable after 2 evictions within 7 days atomically flips
  `CLAUDE_PILOT_CHROMA_ENABLED=false` in `settings.json`. Search
  falls back to SQLite via existing `SearchOrchestrator` path
  (same escape hatch as claude-mem #991).
- `/api/vector-db/health` now reports physical size, largest file,
  and last-eviction marker so operators can see what happened.

## Project-scoped annotations in Changes view

Annotations were visible across projects — two concurrent "Changes"
sessions showed each other's annotations on the same screen. Scopes
`AnnotationRoutes` queries by project and filters the Changes view
accordingly.

## Model routing settings UI

Surface per-role model selection (opus/sonnet/haiku) in the console
Settings view. Launcher `model_config.py` + `settings_injector.py`
read `~/.pilot/config.json` and inject model preferences into plugin
settings on startup.

## Spec workflow refinements

- spec-bugfix-plan: expanded red-flags step, tightened root-cause
  plan + write-plan steps.
- spec-bugfix-verify: behaviour-contract audit in verify-fix and
  plan-verify steps.
- spec-implement: TDD-loop guidance updated.
- docs: workflows/spec.md and features/{console,model-routing,
  statusline}.md updated to match.

## Statusline widgets

Widget formatter updates with corresponding test coverage.

## Installer

`claude_files.py` gains ~216 lines for non-destructive settings/skill
merges on re-install; tests updated accordingly.

## Bundled assets

`pilot/scripts/*.cjs` and `pilot/ui/*.js` regenerated from
`console/src/` to match the fixes above.

## Verification

- Bun: 1349 pass / 0 fail (22 new tests for `VectorDbSizeGuard`)
- Python: 1574 pass / 0 fail
- typecheck clean
## [8.2.4](v8.2.3...v8.2.4) (2026-04-21)

### Bug Fixes

* **hooks:** make SessionEnd fully non-blocking so harness cancellation can't leak workers ([3cb965a](3cb965a))
* prevent vector-db disk exhaustion + model routing UI + project-scoped annotations ([1a066d7](1a066d7)), closes [#130](#130) [#991](https://github.com/maxritter/pilot-shell/issues/991)
…e task cards

- statusline: read rate_limits from Claude Code stdin (cross-platform, no
  OAuth), single-percentage pacing arrow, fall back to lines+git when the
  field is absent (API/Enterprise users). Drop token-count suffix on the
  context bar.
- installer: vendor installer/skill_builder.py and build SKILL.md in-process
  during install; no longer shells out to the pilot binary. Drift-guard test
  keeps it byte-identical to launcher/skill_builder.py. Bump parallel
  download workers 8 -> 16.
- launcher/wrapper: generic capability probe caches claude --help once and
  appends --thinking-display summarized when supported.
- launcher/codegraph: silence redundant native-SQLite repair message.
- console/Topbar: spec pills now show hover cards with parsed task lists
  (completed + next pending) and bugfix/feature icons.
- console/Changes: prefer main-branch plans over worktree plans when
  auto-selecting active plan so annotations match the current checkout;
  CodeReviewPanel filters diffs by path.
- skills: prd/03-clarify skips questions already answered upstream;
  spec-bugfix-plan gains the git log -S pattern for token-change forensics;
  verify steps tightened.
- settings: enable showThinkingHint.
- ignore local claude-pace/ reference clone.
# [8.3.0](v8.2.4...v8.3.0) (2026-04-23)

### Features

* cross-platform statusline usage, in-process skill build, console task cards ([9f394be](9f394be))
Primary feature: /benchmark — quantitative before/after measurement for
Claude Code skills and rules. Isolated per-run sandboxes, crash-proof
global-rule auto-hiding, parallel workers, and an LLM grader that emits
structured per-assertion verdicts plus improvement suggestions.

- pilot/skills/benchmark/: orchestrator + 6 step files (intake → target
  discovery → author evals → execute → present findings → improvement
  plan), runner.py with parallel workers and per-run filesystem isolation,
  aggregate_benchmark.py, isolation.py, progress.py, utils.py,
  agents/grader.md, and ~1800 LOC of unit test coverage
- docs/docusaurus/docs/workflows/benchmark.md: workflow documentation

Also in this commit:
- /prd skill: new 02b-ideate step, orchestrator and 8 steps updated
- console: pilot-spawn utility + tests, license routes tests, user-message
  handler fixes, worker-service refactor, Dashboard/Requirements/Topbar
  viewer updates
- installer: finalize step + claude_files test updates
- launcher: banner and statusline formatter updates
- CI/release: release-dev and release workflow tweaks, pre-commit hook
- docs/site: ConsoleSection, InstallSection, WorkflowSteps, PRD and spec
  workflow pages, intro, statusline feature page
- misc: README, .gitignore, pyproject.toml
# [8.4.0](v8.3.0...v8.4.0) (2026-04-24)

### Features

* add /benchmark skill and evaluation framework ([843ef85](843ef85))
console/usage:
- Replace external ccusage CLI with native Claude session analytics
- New pricing engine (LiteLLM fetch + bundled fallback + model aliases)
  filters out <synthetic>/unknown internal markers
- New /api/usage/{daily,monthly,models,yield,plan} endpoints; cross-view
  parity ensures Sessions and Usage agree on cost
- Restructured Usage view: equal-height pairs (Cost-by-Model + Routing,
  Code-Quality + Tool-Output-Compression), Plan progress below
- Cost-by-Model groups by family (Opus/Sonnet/Haiku) with distinct colors
- Code-Quality card: Day/Week/Month toggle, commits-shipped headline,
  cost-per-shipped-commit, productive-spend %, degraded-state surfacing
- Plan tracking via SettingsRoutes (Pro/Max/Custom + month-end forecast)
- Yield uses execFile + per-call timeouts + ref validation + HEAD-sha cache

ccusage scrub:
- Drop install_ccusage from installer + tests, uninstall hint, and the
  open-source-tools docs row; codeburn/ source repo deleted

pilot/hooks:
- New tool_redirect hook (with tests) to redirect deferred sub-agent flows

pilot/rules:
- New documentation-sync rule; tweaks to development-practices, testing,
  and verification

pilot/skills:
- benchmark: runner/utils tightening, review-iterate + improvement-plan
  step rewrites, new test coverage
- setup-rules: split out new sync-agents step, renumber summary 11 -> 12

docs:
- Refreshed benchmark, setup-rules, statusline, rules, intro pages
- Site SEO/sitemap/indexnow plumbing, hero copy, Schema.org logo fix
- Stddev/pass-rate disambiguation and removal of phantom Step 6 refs in
  benchmark.md (CodeRabbit findings)
## [8.4.1](v8.4.0...v8.4.1) (2026-04-27)

### Bug Fixes

* native usage analytics, ccusage scrub, hook + rules + site updates ([c7cd604](c7cd604))
Adds /fix as the standard bugfix command — investigate, RED test, fix,
audit, done — with mandatory end-to-end verification. Slims the existing
/spec bugfix lane substantially. The workflow change addresses the issue
where two simple bugs took 21+ minutes and 400k tokens halfway through.

## /fix — new user-invocable command

- New skill at pilot/skills/fix/ with six steps (investigate → RED → fix →
  audit → quality → finalise). Always quick: stops cleanly and tells the
  user to re-invoke with /spec when complexity exceeds the quick lane.
- Iron Law #4: never claim done from unit tests alone. Step 4.3 is
  mandatory end-to-end verification — UI bugs must run browser automation
  via Claude Code Chrome / Chrome DevTools MCP / playwright-cli /
  agent-browser; CLI/API/library bugs must be exercised with real
  invocations and concrete evidence captured in the completion report.
- Step 1.4 adds explicit Instrument-when-needed guidance with SPEC-DEBUG
  markers (cleanup grep enforced at audit).
- Registered as user-invocable, model: opus.

## Slimmed full bugfix lane (/spec)

- Dropped Codex from spec-bugfix-plan entirely (was step 7) — adversarial
  review on a plan with no code yet was overhead. Codex still runs at
  spec-verify time.
- Codex-once rule across spec-plan + spec-verify: a session-scoped
  sentinel file prevents re-runs across plan iterations within one /spec
  invocation (was re-running on every annotate/verify cycle).
- spec-bugfix-verify: 11 steps → 8. Dropped redundant plan-verify (no-op),
  runtime-verification (foldable into quality), post-merge-verify
  (squash merges with no conflicts are safe).
- spec-bugfix-verify Step 2 (Behavior Contract audit): tightened from
  ~6.4KB to ~3.8KB. Sub-steps 2.5/2.6 merged into 2.4. Step 2.6 now
  enforces mandatory E2E verification with concrete evidence — no
  claim-VERIFIED-on-tests-alone.
- spec-bugfix-plan: tightened red-flags step with Common Rationalizations
  and User Signals tables (inspired by superpowers/systematic-debugging).
  Step 3.3 adds a concrete multi-layer instrumentation example. Step 4
  fix-approach-selection defaults to picking the obvious fix rather than
  manufacturing fake alternatives.
- spec-implement Task 2 (bugfix lane): drops the per-task full-suite run.
  For bundled bugfix plans this was the single biggest token sink — full
  suite now runs once at the Quality Gate, targeted module test runs
  between fix iterations.

## Iteration cap

- spec-bugfix-verify Step 8 caps fix iterations at 3. After three failed
  verify cycles, AskUserQuestion presents Continue/Pivot/Abandon — stops
  the fix-on-fix-on-fix death spiral that previously had no bound.

## Dispatcher / model defaults / Console

- /spec dispatcher restored to clean split: /spec routes bugs to
  spec-bugfix-plan (full lane), /fix is a separate user-facing command
  for the quick lane. No --full flag, no auto-routing between commands.
- launcher/model_config.py: /fix and /benchmark added to default skill
  models (opus). Console useSettings.ts mirrors.
- Console Settings page: General Rows now lists /fix and /benchmark with
  descriptive labels + sub-text. SPEC_PHASE_ROWS rendered with sub-text.
- launcher/banner.py + installer/steps/finalize.py: /fix and /benchmark
  added to quick-start tips and post-install workflow list. Spec-Driven
  feature description updated to mention both /spec and /fix.
- pilot/settings.json: companyAnnouncements line now lists /fix; spinner
  tip updated.

## Docs

- New Docusaurus page docs/workflows/fix.md with workflow diagram,
  Common Issues table, mandatory E2E section, /spec vs /fix decision
  table. Sidebars updated.
- /spec page trimmed of bugfix-only content (now points at /fix).
- README: new /fix section with How /fix works + How /spec handles bugs
  details blocks.
- Marketing site WhatsInside.tsx updated to mention /fix.

## Other-session changes incorporated

- Console: parsePlanContent.ts + tests for spec section rendering and
  plan annotator persistence (Bug A + Bug B fixes).
- pilot/hooks/tool_redirect.py + tests: extended.
- pilot/.mcp.json + hooks/hooks.json + scripts/*.cjs: minor updates.
- pilot/rules/development-practices.md + mcp-servers.md: updates.
- installer/downloads.py: minor.
- pilot/ui/* compiled bundles: rebuilt.
- .devcontainer/*: minor adjustments.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
pilot-shell Ignored Ignored Apr 29, 2026 8:14am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

This PR represents version 8.4.1, repositioning Pilot Shell as a "Claude Code Engineering Platform" with major documentation overhaul, new workflows (/benchmark, /fix), platform features (Pilot Bot, customization system), substantial installer refactoring (skill-builder, plugin installers, customization reapplication), and comprehensive site/SEO metadata updates.

Changes

Cohort / File(s) Summary
Infrastructure & Environment
.devcontainer/Dockerfile, .devcontainer/devcontainer.json, .githooks/pre-commit, .gitignore
Added OS-level dependencies (bubblewrap, iptables, ipset), persistent Claude config volume mounting, benchmark test detection in pre-commit, and expanded .gitignore patterns for generated artifacts (benchmarks, skill outputs, local configs).
CI/CD Workflows
.github/workflows/release-dev.yml, .github/workflows/release.yml
Extended pytest to include benchmark tests; pinned Bun version to 1.3.11 (replacing latest) for consistent console test/build toolchain.
Version & Changelog
CHANGELOG.md, installer/__init__.py
Added extensive changelog sections for versions 8.4.1→8.0.0 covering new features (benchmark framework, Pilot Bot, customization), bug fixes, and improvements; updated installer version from 7.11.4 to 8.4.1.
Installer Core Logic
installer/skill_builder.py, installer/downloads.py, installer/steps/claude_files.py, installer/steps/config_migration.py, installer/steps/dependencies.py, installer/steps/finalize.py
Introduced skill_builder module for normalized SKILL.md generation; enhanced download retry logic with HTTP status differentiation; added skill SKILL.md materialization with fragment recovery and customization reapplication; refactored dependencies to use Claude CLI plugins (context-mode, Codex, Chrome DevTools MCP), better-sqlite3, and parallel task execution; updated config migration to preserve worktree choices; refined finalize UI for new workflows.
Installer Tests
installer/tests/unit/steps/test_claude_files.py, installer/tests/unit/steps/test_config_migration.py, installer/tests/unit/steps/test_dependencies.py, installer/tests/unit/steps/test_dependencies_playwright.py, installer/tests/unit/steps/test_finalize.py, installer/tests/unit/test_*.py, installer/tests/unit/test_skill_builder.py
Expanded test coverage for SKILL.md building, customization reapplication, config migration with worktree preservation, new plugin installers, better-sqlite3, codegraph git detection, parallel task execution, browser ARM64 installs, and skill_builder module vendoring integrity.
Documentation - Platform Messaging
README.md, docs/docusaurus/docs/intro.md, installer/ui.py
Repositioned as "Claude Code Engineering Platform" with plan→implement→verify workflow, persistent context, quality hooks, and local dashboard visibility; removed prior generic "What's Inside" sections.
Documentation - Workflows
docs/docusaurus/docs/workflows/benchmark.md, docs/docusaurus/docs/workflows/fix.md, docs/docusaurus/docs/workflows/prd.md, docs/docusaurus/docs/workflows/spec.md, docs/docusaurus/docs/workflows/create-skill.md, docs/docusaurus/docs/workflows/setup-rules.md
Added /benchmark (quantitative evaluation with assertions) and /fix (RED→GREEN bugfix discipline); updated /prd to emphasize brainstorming/divergent ideation; enhanced /spec with bugfix routing, branch strategies, and Chrome DevTools MCP for E2E; refined /setup-rules phases and /create-skill messaging.
Documentation - Features
docs/docusaurus/docs/features/bot.md, docs/docusaurus/docs/features/cli.md, docs/docusaurus/docs/features/console.md, docs/docusaurus/docs/features/context-optimization.md, docs/docusaurus/docs/features/customization.md, docs/docusaurus/docs/features/extensions.md, docs/docusaurus/docs/features/hooks.md, docs/docusaurus/docs/features/mcp-servers.md, docs/docusaurus/docs/features/model-routing.md, docs/docusaurus/docs/features/open-source-tools.md, docs/docusaurus/docs/features/remote-control.md, docs/docusaurus/docs/features/rules.md, docs/docusaurus/docs/features/statusline.md
Introduced Pilot Bot documentation with skills and scheduled-task automation; added Customization guide for team-wide config/skill overlays; updated MCP servers to document context-mode sandbox routing and Chrome DevTools MCP; refined console views (Requirements, Dashboard, Sessions, Memories, Specifications); updated model routing for Opus 4.7; expanded rules with documentation-sync and context-mode; revised statusline for rate_limits-driven pacing display.
Documentation - Getting Started
docs/docusaurus/docs/getting-started/installation.md, docs/docusaurus/docs/getting-started/prerequisites.md, docs/docusaurus/docs/getting-started/permission-modes.md
Updated browser automation flow to include Chrome DevTools MCP as enterprise fallback; repositioned Codex from optional to included with auto-setup; updated Auto Mode requirements to Claude Sonnet 4.6 or Opus 4.7.
Docusaurus Configuration
docs/docusaurus/docusaurus.config.ts, docs/docusaurus/sidebars.ts
Updated tagline and added trailingSlash control plus social/SEO metadata (site image, keywords, Twitter card, Open Graph); extended sidebar with new workflow/feature entries.
Site Build & Distribution
docs/site/build-all.sh, docs/site/vercel.json
Updated build script to handle trailingSlash: false routing (docs.html, search.html, sitemap placement under docs/); added Vercel redirects from claude-pilot.com domains to pilot-shell.com.
Site Pages & SEO
docs/site/index.html, docs/site/public/manifest.json, docs/site/public/robots.txt, docs/site/public/0bd196f90bbc9ec8113bd78de2507fb2.txt
Extensive SEO overhaul: updated title, description, keywords; added schema.org (WebSite, SoftwareApplication, HowTo, FAQPage); added dns-prefetch hints, IndexNow verification, noscript fallback; updated robots.txt with explicit crawler policies; updated manifest description to platform tagline; added IndexNow key verification file.
Site Components - Messaging & Navigation
docs/site/src/components/HeroSection.tsx, docs/site/src/components/Logo.tsx, docs/site/src/components/NavBar.tsx, docs/site/src/components/Footer.tsx, docs/site/src/components/SEO.tsx, docs/site/src/components/InstallSection.tsx
Updated hero section with "Pilot Shell" headline and new feature highlights; changed badge labels (Remote Control→LSP Servers, Reusable Skills→Pilot Bot); updated docs link normalization; refreshed taglines and default SEO metadata; changed /prd label from requirements to brainstorm.
Site Components - Feature Content
docs/site/src/components/ConsoleSection.tsx, docs/site/src/components/DeepDiveSection.tsx, docs/site/src/components/WhatsInside.tsx, docs/site/src/components/WorkflowSteps.tsx
Reordered console slides with updated descriptions; updated Hooks Pipeline stage/file names and hook count; replaced MCP server names; added context-mode content emphasis; refactored WhatsInside with href links and new Customization feature; expanded WorkflowSteps with /benchmark workflow and /prd emphasis on brainstorming.
Site Components - FAQ & Testimonials
docs/site/src/components/FAQSection.tsx, docs/site/src/components/TestimonialsSection.tsx, docs/site/src/components/PricingSection.tsx
Updated FAQ model default to Opus 4.7 and added customization FAQ; refreshed testimonials with new quotes and unified role labels; updated pricing tiers (Solo: Pilot Bot 24/7 automation, Team: Customization).
Site Annotation & Sharing
docs/site/src/components/feedback/AnnotationToolbar.tsx, docs/site/src/components/feedback/BlockRenderer.tsx, docs/site/src/components/feedback/FeedbackSidebar.tsx, docs/site/src/components/feedback/index.ts, docs/site/src/lib/annotation/types.ts, docs/site/src/lib/annotation/useAnnotation.ts, docs/site/src/lib/annotation/index.ts, docs/site/src/lib/sharing/types.ts, docs/site/src/pages/Shared.tsx
Removed AnnotationToolbar component and pending-selection state management; updated BlockRenderer styling (Amber→Blue palette) and removed onBlockMouseUp prop; simplified FeedbackSidebar empty state UI; added contentType field to SharePayload for requirement vs specification labeling; refactored Shared page to remove selection-based submission plumbing.
Site Plugins & Configuration
docs/site/vite-plugin-indexnow.ts, docs/site/vite-plugin-sitemap.ts, docs/site/vite.config.ts, docs/site/tsconfig.app.json
Added IndexNow plugin for URL indexing submission; updated sitemap plugin to generate sitemap-pages.xml with priority/changefreq and sitemap.xml index; integrated plugins into Vite config; removed baseUrl from tsconfig.
Install Script
install.sh
Added conditional restart message based on PILOT_RESTART_BOT_MODE environment variable.
Platform Utilities
installer/platform_utils.py
Removed redundant blank line.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

released

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.66% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: introduction of a new /fix command with a redesigned bugfix workflow, which aligns with the PR's core objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@maxritter maxritter closed this Apr 29, 2026
@maxritter maxritter deleted the dev branch April 29, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants