feat: rename interactive mode to design + Symphony SPEC.md output#498
feat: rename interactive mode to design + Symphony SPEC.md output#498akashgit wants to merge 2 commits into
Conversation
❌ Factory Review: REVERTVerdict: REVERT Experiment: #2 Score Comparison
Guard Checks
Precheck GateCode Review Notes
Posted by Factory CEO |
788dcea to
3fbede3
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #498 +/- ##
=======================================
Coverage 86.77% 86.77%
=======================================
Files 64 64
Lines 10027 10029 +2
=======================================
+ Hits 8701 8703 +2
Misses 1326 1326 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
✅ Factory Review: KEEPVerdict: KEEP Experiment: #2 Score Comparison
Guard Checks
Posted by Factory CEO |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The interactive→design rename changed the distiller output from Vision/Core Features/Architecture to numbered Symphony sections. Update test_has_output_format to check for the new section headers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3d42a2e to
854f955
Compare
|
@ceo-review |
There was a problem hiding this comment.
❌ Factory Review: REVERT
Verdict: REVERT
Reason: Incomplete renaming — documentation files not updated
Code Review Notes
- Core code changes excellent and fully backward-compatible, but README.md, docs/*.md, and CHANGELOG.md still reference 'interactive mode' and 'idea.md'. Update docs to use --mode design as primary flag.
- Symphony SPEC.md format is well-structured with RFC 2119 normative language — good improvement for buildability
- Backward-compat alias (interactive→design) properly implemented in cli.py:2293-2294
- All Python tests updated correctly, including new backward-compat test at test_cli.py:825
Posted by Factory CEO
|
@akashgit triage: #492, #494, and #498 are three generations of the same change (interactive→design rename + Symphony SPEC.md). Since they're all yours — which one should survive? Happy to close the other two once you pick. For what it's worth, #492 has the most precise scope description (explicitly excludes the runner-concept references), while this one is the newest. |
Proposal: Merge Distiller into Strategist + Standardize on SPEC.mdAfter studying this PR and thinking through the implications, here's a proposal for how to evolve the design mode work further. The ProblemRight now, when a user goes through design mode and approves a spec, the system re-plans the work downstream. The Strategist in Build mode re-decomposes the spec into phases, and in that process, things get moved to the backlog that the user already approved. The spec is treated as a suggestion, not a contract. This causes scope erosion — the user approved 10 features but only 6 get built. The root cause: there's a redundant re-planning step between the user-approved spec and the Builder. The Strategist re-interprets work the user already signed off on. Proposal1. Merge the Distiller agent into the Strategist. The Distiller and the Strategist (in Build mode) do complementary work on the same artifact:
These should be one agent. The Strategist already knows how to prioritize, order by dependency, and scope to one-PR-per-phase. Teaching it to also write a spec is easier than teaching the Distiller strategic thinking. In design mode, the Strategist would:
The user iterates on this in the design loop — they see not just the features but the build order. Once approved, the SPEC.md is a contract. The Distiller agent gets retired. Its prompt gets folded into the Strategist's design-mode behavior. 2. Standardize the Strategist's output to SPEC.md format across all modes. Instead of the Strategist producing different formats in different modes ( 3. Eliminate the Strategist and Researcher steps in Build mode (B0, B1) when a user-approved SPEC.md exists. If the user already approved a SPEC.md through design mode, the CEO reads the Implementation Plan directly and feeds phases to the Builder. No re-research, no re-planning, no opportunity to downscope. The Hybrid SPEC.md FormatThis combines what this PR proposes for the Distiller's Symphony output with the Strategist's build-planning capabilities. The top half is the spec (Symphony format from this PR). The bottom half is the strategic decomposition (what the Strategist currently puts in # Project Name — Specification
## Normative Language
RFC 2119 keywords...
## 1. Problem Statement
What problem this solves and why it matters.
## 2. Goals and Non-Goals
### 2.1 Goals
- ...
### 2.2 Non-Goals
- ...
## 3. System Overview
### 3.1 Architecture
- ...
### 3.2 Tech Stack
- Language/framework choices with rationale grounded in research
## 4. Core Domain Model
Key entities and their relationships.
## 5. Detailed Specification
### 5.1 Feature: Location Lookup
- **What:** User-visible behavior
- **How:** Implementation approach — libraries, data flow
- **Why:** Research-grounded rationale
### 5.2 Feature: Forecast Display
- **What:** ...
- **How:** ...
- **Why:** ...
## 6. Reference Algorithms
Any non-trivial algorithms or protocols.
## 7. Test and Validation Matrix
How to verify each feature works.
## 8. Implementation Plan
### Phase 1: Project scaffold + eval harness
- [ ] Initialize repo, pyproject.toml, dependencies
- [ ] Create eval/score.py with baseline dimensions
- [ ] Set up CI configuration
- **Scope:** one PR
- **Priority:** FIX (foundation must exist first)
### Phase 2: Core data model + location lookup (§5.1)
- [ ] Implement Location model
- [ ] Implement geocoding API client
- [ ] Add unit tests for location resolution
- **Depends on:** Phase 1
- **Scope:** one PR
- **Priority:** EXPLORE
### Phase 3: Forecast display + CLI (§5.2)
- [ ] Implement forecast rendering
- [ ] Add CLI argument parsing
- [ ] Add error handling for API failures
- **Depends on:** Phase 2
- **Scope:** one PR
- **Priority:** EXPLORE
### Blocked (requires user input)
- Stripe billing — needs STRIPE_API_KEY from user
- Deployment target — user must choose hosting providerKey differences from the current Symphony Implementation Checklist:
What Changes in Each ModeDesign mode (new projects):
Design mode (existing projects):
Build mode:
Improve mode:
What Changes in the CEO Prompt
Impact on This PRThis PR's rename from
The rename and backward-compat alias from this PR are good as-is. The format and agent changes would be follow-up work on top of this PR's foundation. |
Follow-up: Standardizing on SPEC.md — Impact Analysis and Testing PlanBuilding on the proposal above, here's the detailed breakdown of what actually changes, what doesn't, and how to make sure we don't break anything. The Core InsightThe only thing that changes is the output format of the Strategist. The Strategist's logic — FEEC prioritization, growth/hygiene balance, backlog convergence, stuck protocol, design space scoring — all stays identical. We're reformatting the output, not rewriting the brain. The Python code ( What the SPEC.md Format Looks Like in Improve ModeThe key constraint: this format must be friendly to the existing CEO review logic. All the tags the CEO currently checks for ( # Improvement Cycle — Specification
## 1. Current State
- Composite: 0.72
- Weakest: observability (0.3)
- Last 3 experiments: #5 keep (+0.02), #6 revert (-0.01), #7 keep (+0.03)
- Pattern: observability consistently underserved
## 2. Goals and Non-Goals
### 2.1 Goals
- Improve observability from 0.3 to 0.6
- Fix flaky auth test
### 2.2 Non-Goals
- Not optimizing API latency this cycle
## 3. Design Space
| Dimension | Score | Notes |
|---|---|---|
| Features | 4 | Well-explored |
| Instrumentation | 1 | Underserved |
| ... | ... | ... |
**Underserved:** Instrumentation, Operational execution, Knowledge management
## 4. Detailed Specification
### 4.1 Fix flaky auth test
- **What:** Mock the external OAuth endpoint in test suite
- **How:** Use responses library to stub OAuth token endpoint
- **Why:** Test suite fails intermittently, blocking reliable evals
- **Category:** FIX
- **Expected impact:** tests 0.8→0.9
- **Priority:** high
### 4.2 Add structured logging
- **What:** Add structlog to payment, auth, API modules
- **How:** Replace print statements with structlog, add request ID middleware
- **Why:** Observability is weakest dimension at 0.3
- **Category:** EXPLOIT
- **Backlog item:** add logging to API modules
- **Growth dimension:** observability
- **Expected impact:** observability 0.3→0.6
- **Priority:** high
## 5. Implementation Plan
### Phase 1: Fix flaky auth test (§4.1, FIX)
- [ ] Add responses mock for OAuth endpoint
- [ ] Verify test passes 10 consecutive runs
- **Scope:** one PR
### Phase 2: Add structured logging (§4.2, EXPLOIT)
- [ ] Add structlog to payment module
- [ ] Add structlog to auth module
- [ ] Add structlog to API module
- [ ] Add request ID middleware
- **Depends on:** Phase 1
- **Scope:** one PR
## 6. Anti-patterns
- Don't retry the same prompt change (reverted 3x)
## 7. Blocked (requires user input)
- (none this cycle)
## 8. Proposed Backlog Additions
- Add rate limiting to APINotice the tags are identical to today's hypothesis format — The Implementation Plan (§5) adds the dependency ordering and phase scoping that the CEO currently gets from the Strategist's hypothesis ordering. In Improve mode, this is mostly the same as FEEC ordering — FIX phases first, then EXPLOIT, then EXPLORE. The cross-references (§4.1, §4.2) connect phases back to the detailed spec. Exact Scope of ChangesPrompt files (the real work):
Python code (no functional changes):
Testing Plan — Mode by ModeThis is a format change to the Strategist's output. Every mode that reads the Strategist's output must be explicitly tested to verify nothing breaks. 1. Improve mode (MOST CRITICAL — must not regress) Improve mode works well today. The format change must be invisible to the downstream pipeline. Test:
2. Design mode (new projects) This is the mode that benefits most from the change. Test:
3. Design mode (existing projects)
4. Build mode (without design mode — e.g., When there's no user-approved SPEC.md, the Strategist still runs at B1 and produces a SPEC.md internally. Test:
5. Research mode Research mode has its own Strategist template with
6. Meta mode Meta mode is Improve mode + ACE playbook evolution. If Improve mode works, Meta mode should work. Test:
7. Backward compatibility
Recommended Implementation Order
Steps 2-6 can be one PR. Step 7 gates the merge. |
|
The Distiller+Strategist merge proposal from this thread is now tracked as #523 and is being implemented. The scope is narrower than the full SPEC.md standardization discussed here — just merge the two agents, skip the redundant B0+B1 steps in interactive/research modes, and retire the Distiller. Format changes (SPEC.md etc) are follow-up work. Related issues from the same discussion:
|
Factory experiment 2. Closes #440 #497.