feat: add meta mode Phase 3 — classify and contribute upstream by Maxusmusti · Pull Request #376 · akashgit/remote-factory

Maxusmusti · 2026-05-25T22:33:42Z

Summary

Adds a contribution pipeline to meta mode that classifies evolved playbook items as general (upstream-worthy) vs project-specific (local only), then lets users contribute the general ones back as PRs
New factory contribute CLI command with --classify, --submit, and --status subcommands
CEO prompt updated with Phase 3 (M4/M5/M6) that runs after ACE evolution

Motivation

Currently, meta mode evolves playbooks locally via ACE — all learnings stay in ~/.factory/playbooks/ and never flow back upstream. This means every user independently re-discovers the same improvements. With this change, the factory can identify patterns that are universally useful across diverse projects and contribute them back to the default playbooks, closing the self-improvement loop: the more the factory is used, the better it becomes for everyone.

How it works

Classification engine (`factory/ace/contributor.py`)

Each evolved playbook item is scored on a general-vs-specific spectrum using four weighted signals:

Signal	Weight	What it measures
Cross-project prevalence	40%	Does this pattern appear across 3+ unrelated projects?
Domain independence	25%	Does it reference factory internals or project-specific frameworks?
Evidence strength	20%	How many observations (helpful/harmful) support it?
Category signal	15%	Is the hypothesis category inherently general (e.g., prompt_engineering) or specific (e.g., feature)?

Items scoring ≥ 0.65 are classified as general, ≤ 0.35 as specific, and between as uncertain.

User experience

At the end of a meta mode run, users see a terminal summary:

════════════════════════════════════════════════════════════
                    META MODE SUMMARY
════════════════════════════════════════════════════════════

PLAYBOOK EVOLUTION COMPLETE
  9 items evolved across 5 roles
  3 general (upstream candidates)  |  3 specific (local only)  |  3 uncertain

────────────────────────────────────────────────────────────
GENERAL IMPROVEMENTS (upstream candidates)
────────────────────────────────────────────────────────────

  1. [strategist] "Always run type checkers after making changes"
     Generality: ████████░░ 0.81  |  5 projects  |  16 experiments
     Category: type_safety

────────────────────────────────────────────────────────────
PROJECT-SPECIFIC IMPROVEMENTS (staying local)
────────────────────────────────────────────────────────────

  1. [builder] "Use iframe wait patterns for Playwright tests"
     Generality: ██░░░░░░░░ 0.22  |  1 project  |  5 experiments
     Why local: single-project signal, domain-specific (Playwright)

════════════════════════════════════════════════════════════
Run `factory contribute` to select items for upstream PR.
════════════════════════════════════════════════════════════

Users can then run factory contribute --submit to create a PR, or skip — contribution is always opt-in.

CLI commands

# Classify evolved items and show summary
factory contribute --classify /path/to/project

# Create PR with all general items
factory contribute --submit /path/to/project --all

# Check pending candidates
factory contribute --status

CEO prompt changes

Phase 3 (steps M4/M5/M6) is added after Phase 2 (ACE). The CEO:

Runs factory contribute --classify to score evolved items
Presents the summary to the user
Waits for explicit approval before submitting — never auto-contributes

Files changed

File	Change
`factory/ace/contributor.py`	New — classification engine, contribution pipeline, terminal summary, git/gh submit, JSON persistence (780 lines)
`factory/cli.py`	Modified — `factory contribute` command with `--classify`/`--submit`/`--status` subcommands
`factory/agents/prompts/ceo.md`	Modified — Phase 3 (M4/M5/M6) + task table entry
`tests/test_contributor.py`	New — 26 tests covering classification, diffing, summary, PR body, persistence

Design decisions

Composition over inheritance for ClassifiedItem wrapping PlaybookItem (since PlaybookItem has extra="forbid")
Reuses existing factory infrastructure: Playbook.from_markdown(), classify_hypothesis(), discover_projects(), load_all_histories(), DEFAULTS_DIR, user_playbooks_dir()
prepare_contribution() returns specs without executing git — keeps the module testable; execute_contribution() handles the actual git/gh commands separately
Fuzzy matching (SequenceMatcher ≥ 0.75) for both cross-project evidence and playbook diffing, consistent with the existing reflector

Test plan

26 new tests pass (pytest tests/test_contributor.py)
261 existing tests pass — zero regressions
factory contribute --help shows correct usage
All imports resolve correctly
Manual test: run factory contribute --classify on a project with evolved playbooks
Manual test: run factory contribute --submit --dry-run to verify PR spec generation

🤖 Generated with Claude Code

Add the ability for meta mode to distinguish general improvements from project-specific ones, and contribute the general items back upstream as PRs. This closes the self-improvement loop: the more the factory is used across diverse projects, the better its default playbooks become. New CLI command `factory contribute` with three modes: - `--classify`: scores evolved playbook items on a general-vs-specific spectrum using four weighted signals (cross-project prevalence 40%, domain independence 25%, evidence strength 20%, category signal 15%) - `--submit`: creates a PR against the factory repo with approved items - `--status`: shows pending contribution candidates The classification engine uses cross-project experiment data to identify items that appear across 3+ unrelated projects as "general" (upstream candidates), single-project items as "specific" (local only), and everything in between as "uncertain" (needs more data). The CEO prompt is updated with Phase 3 (M4/M5/M6) which runs after ACE evolution, presents the user with a terminal summary showing the distinction, and lets them opt in to contributing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-05-25T22:35:33Z

Codecov Report

❌ Patch coverage is 93.70277% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.56%. Comparing base (190741e) to head (290672d).
⚠️ Report is 86 commits behind head on main.

Files with missing lines	Patch %	Lines
factory/cli.py	65.51%	20 Missing ⚠️
factory/ace/contributor.py	98.52%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #376      +/-   ##
==========================================
+ Coverage   87.54%   87.56%   +0.02%     
==========================================
  Files          60       62       +2     
  Lines        9170     9734     +564     
==========================================
+ Hits         8028     8524     +496     
- Misses       1142     1210      +68

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add 26 tests covering uncovered paths: classify_evolved_playbooks pipeline, package_evidence, prepare_contribution, execute_contribution (mocked subprocess), explain_specificity/uncertainty branches, load_candidates edge cases, and cmd_contribute CLI handler. Fix lint: remove unused imports, rename ambiguous variable, drop unused locals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

abhi1092 · 2026-06-02T20:27:26Z

I tried running, but unfortunately the meta mode didn't have enough to update the placebook. So the classify and contribute upstream didn't find anything meaningful from that small pool to contribute upstream

(base) abhishek@Abhisheks-MacBook-Pro remote-factory_1 % uv run factory contribute --classify .
2026-06-02 16:19:14 [debug    ] registry_stale_entry           path=/Users/abhishek/factory-projects/usersabhishekfactory-projectsadapt-fm-challeng/.factory/worktrees/run-84c684a4
2026-06-02 16:19:14 [debug    ] registry_stale_entry           path=/Users/abhishek/factory-projects/usersabhishekfactory-projectsadapt-fm-challeng/.factory/worktrees/run-751a66c3
2026-06-02 16:19:14 [debug    ] registry_stale_entry           path=/Users/abhishek/factory-projects/usersabhishekfactory-projectsadapt-fm-challeng/.factory/worktrees/run-92c21e96
2026-06-02 16:19:14 [debug    ] registry_stale_entry           path=/Users/abhishek/factory-projects/usersabhishekfactory-projectsadapt-fm-challeng/.factory/worktrees/run-76698f5d
2026-06-02 16:19:14 [debug    ] registry_stale_entry           path=/Users/abhishek/rotation_project/eda_papers/remote-factory_1/.factory/worktrees/run-934fd3ae
2026-06-02 16:19:14 [info     ] discover_projects_complete     count=2 dir=/Users/abhishek/rotation_project/eda_papers
2026-06-02 16:19:14 [debug    ] load_history_complete          record_count=7
2026-06-02 16:19:14 [debug    ] load_history_complete          project=cc_with_kimi record_count=7
2026-06-02 16:19:14 [debug    ] load_history_complete          record_count=4
2026-06-02 16:19:14 [debug    ] load_history_complete          project=remote-factory_1 record_count=4
2026-06-02 16:19:14 [info     ] load_all_histories_complete    project_count=2
2026-06-02 16:19:14 [info     ] classify_evolved_playbooks_complete general=0 specific=0 uncertain=1
════════════════════════════════════════════════════════════
                    META MODE SUMMARY
════════════════════════════════════════════════════════════

PLAYBOOK EVOLUTION COMPLETE
  1 items evolved across 1 roles
  0 general (upstream candidates)  |  0 specific (local only)  |  1 uncertain
────────────────────────────────────────────────────────────
UNCERTAIN (needs more data)
────────────────────────────────────────────────────────────

  1. [strategist] "Build on coverage momentum — 3/3 experiments produced positive score deltas"
     Generality: █████░░░░░ 0.50  |  0 projects  |  3 experiments
     Needs: needs more cross-project evidence (currently 0/3 threshold)

════════════════════════════════════════════════════════════
Run `factory contribute` to select items for upstream PR.
════════════════════════════════════════════════════════════
2026-06-02 16:19:14 [info     ] save_candidates                general=0 path=/Users/abhishek/rotation_project/eda_papers/remote-factory_1/.factory/contribution_candidates.json
2026-06-02 16:19:14 [debug    ] event_emitted                  agent=None project=remote-factory_1 type=contribute.classify_complete
(base) abhishek@Abhisheks-MacBook-Pro remote-factory_1 % factory contribute --submit . --dry-run
zsh: command not found: factory
(base) abhishek@Abhisheks-MacBook-Pro remote-factory_1 % uv run factory contribute --submit . --dry-run
2026-06-02 16:20:11 [debug    ] registry_stale_entry           path=/Users/abhishek/factory-projects/usersabhishekfactory-projectsadapt-fm-challeng/.factory/worktrees/run-84c684a4
2026-06-02 16:20:11 [debug    ] registry_stale_entry           path=/Users/abhishek/factory-projects/usersabhishekfactory-projectsadapt-fm-challeng/.factory/worktrees/run-751a66c3
2026-06-02 16:20:11 [debug    ] registry_stale_entry           path=/Users/abhishek/factory-projects/usersabhishekfactory-projectsadapt-fm-challeng/.factory/worktrees/run-92c21e96
2026-06-02 16:20:11 [debug    ] registry_stale_entry           path=/Users/abhishek/factory-projects/usersabhishekfactory-projectsadapt-fm-challeng/.factory/worktrees/run-76698f5d
2026-06-02 16:20:11 [debug    ] registry_stale_entry           path=/Users/abhishek/rotation_project/eda_papers/remote-factory_1/.factory/worktrees/run-934fd3ae
No general items to contribute.

@akashgit could you give these two command a try?

uv run factory contribute --classify .
uv run factory contribute --submit . --dry-run

xukai92 · 2026-06-03T20:49:37Z

@ceo-review

github-actions · 2026-06-03T20:56:09Z

✅ Factory CEO Review: KEEP

Verdict: KEEP
Reason: Classification pipeline is well-architected, thoroughly tested (52 tests), and follows factory conventions. No logic errors, security issues, or scope creep found.

Code Quality Review

Category	Result	Notes
Correctness	✅ PASS	Classification algorithm sound, fuzzy matching consistent with reflector, proper edge case handling
Security	✅ PASS	Subprocess calls safe (no shell=True), Pydantic models strict, no injection risks
Edge Cases	✅ PASS	Empty playbooks, missing files, corrupt JSON, subprocess failures all handled
Test Coverage	✅ PASS	52 comprehensive tests covering all major paths
Style & Consistency	✅ PASS	Snake_case, type hints, Pydantic models, structlog, clear docstrings
Scope Compliance	✅ PASS	Well-bounded, reuses existing infrastructure correctly, no scope creep
Guardrail Compliance	✅ PASS	N/A for this PR

Key Strengths

Well-designed classification engine — 4-signal weighted composite (cross-project 40%, domain independence 25%, evidence strength 20%, category signal 15%) with clear thresholds
Fuzzy matching consistency — Uses SequenceMatcher ≥0.75 threshold, matching reflector.py patterns
Comprehensive test coverage — 52 tests covering classification, diffing, summary rendering, PR generation, persistence, CLI integration, and error paths
Composition over inheritance — ClassifiedItem correctly wraps PlaybookItem (which has extra='forbid')
Separation of concerns — prepare_contribution() returns spec without executing git (testable), execute_contribution() handles side effects
User-friendly terminal output — render_summary() produces clear ASCII art with generality bars and explanations
Error handling — subprocess failures trigger branch cleanup, corrupt JSON returns None instead of crashing

Test Results

✅ 52 new tests pass (tests/test_contributor.py)
✅ 2243 total tests pass (zero regressions)
✅ ruff check passes (no lint errors)
✅ mypy passes (no type errors)

Recommendation

This PR is production-ready. The classification methodology is empirically grounded, the implementation is robust, and the user experience (opt-in contribution, clear terminal summary) is excellent. Code follows factory conventions and is highly maintainable.

Review completed by Factory CEO at 2026-06-03T20:56:09Z

github-actions · 2026-06-03T20:56:23Z

❌ CEO review failed. Check the workflow run for details.

osilkin98 · 2026-06-11T02:59:17Z

Triage: parking this one deliberately. Before the factory grows an upstream contribution pipeline (edit: earlier wording said "auto-submit", which was wrong; submission is user-approved, see discussion below), we want the spec-driven workflow in place first — every change implemented against an explicit spec with a problem statement — otherwise this multiplies the current intent-opacity problem upstream. @akashgit deferring to you if you want to prioritize it differently.

Maxusmusti · 2026-06-11T03:30:43Z

@osilkin98 I'm not sure I follow, this PR does not "give factory the ability to auto-submit PRs". It specifically is adding a step to meta mode runs exclusively, which classifies potential general learnings across projects vs project specific learnings. It gives users the option to review and contribute those learnings back to factory upstream. It just reviews and provides information over long-term factory use, it doesn't submit or act automatically.

osilkin98 · 2026-06-12T14:44:10Z

You're right, I overstated it. I went back and read the diff properly: M6 only runs factory contribute --submit after the M5 summary, and the prompt explicitly says never auto-submit and to wait for user approval. That's on me, I'll fix the wording in my earlier comment.

The narrower thing I'd still raise: the no-auto-submit rule lives in the prompt, and --submit itself doesn't verify a human approved anywhere. In a headless meta run the model is the one deciding it has approval, which is the same pattern that bit us with --force (#417) and the scope lock (#510). Small fix that would settle it completely: make --submit refuse to run when not interactive, or gate it behind a --yes flag the CEO prompt doesn't include. With that in, my parking objection goes away and this just needs normal review.

Maxusmusti and others added 2 commits May 25, 2026 18:38

fix: remove unused imports (ruff F401)

dcc231d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

akashgit added the enhancement Improves existing feature/functionality or code quality, does not change behavior of codebase label May 30, 2026

xukai92 requested a review from akashgit June 4, 2026 02:01

osilkin98 added kind:capability Does something new one-way-door Changes what the project is; needs owner decision stage:learning What the factory retains and propagates: playbooks, records, memory labels Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add meta mode Phase 3 — classify and contribute upstream#376

feat: add meta mode Phase 3 — classify and contribute upstream#376
Maxusmusti wants to merge 3 commits into
mainfrom
feat/meta-mode-upstream-contributions

Maxusmusti commented May 25, 2026

Uh oh!

codecov Bot commented May 25, 2026 •

edited

Loading

Uh oh!

abhi1092 commented Jun 2, 2026

Uh oh!

xukai92 commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

osilkin98 commented Jun 11, 2026 •

edited

Loading

Uh oh!

Maxusmusti commented Jun 11, 2026

Uh oh!

osilkin98 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Maxusmusti commented May 25, 2026

Summary

Motivation

How it works

Classification engine (factory/ace/contributor.py)

User experience

CLI commands

CEO prompt changes

Files changed

Design decisions

Test plan

Uh oh!

codecov Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

abhi1092 commented Jun 2, 2026

Uh oh!

xukai92 commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

✅ Factory CEO Review: KEEP

Code Quality Review

Key Strengths

Test Results

Recommendation

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

osilkin98 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Maxusmusti commented Jun 11, 2026

Uh oh!

osilkin98 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Classification engine (`factory/ace/contributor.py`)

codecov Bot commented May 25, 2026 •

edited

Loading

osilkin98 commented Jun 11, 2026 •

edited

Loading