feat: --bg flag to dispatch agents via claude agent view#424
Conversation
Adds support for claude's native background sessions (--bg). Agents are dispatched to the background supervisor, visible in `claude agents`, and the factory polls for completion before returning output. - Polls ~/.claude/jobs/<id>/state.json for terminal states - Propagates FACTORY_BG=1 to sub-agent environment - --bg implies headless for factory ceo - Skips completion guard respawn loop (single dispatch) - Configurable via --bg / FACTORY_BG / config.toml - Bob/Codex runners warn (claude-only feature) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #424 +/- ##
==========================================
- Coverage 87.03% 86.54% -0.49%
==========================================
Files 62 62
Lines 9643 9739 +96
==========================================
+ Hits 8393 8429 +36
- Misses 1250 1310 +60 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Context for the review below: I've been testing the Claude Fable model for triaging the open PR backlog. It flagged the following on this PR, and this CEO review session independently verified or refuted each claim against the current branch head. Process context: #517.
|
osilkin98
left a comment
There was a problem hiding this comment.
❌ Factory Review: REVERT
Verdict: REVERT
Reason: Timeout is not forwarded to run_in_background, so bg agents silently use a 24-hour default instead of the configured timeout, and a crashed bg session blocks the factory for the full duration with no early detection.
Precheck Gate
5 claims adjudicated: 5 confirmed, 0 refuted. 0 prior verdicts (first review).
Code Review Notes
- Claims and provenance are in my comment above. Each note below is this review's independent adjudication of one claim.
- CLAIM 1 CONFIRMED: .factory.bak is a broken symlink to /Users/akash/cursor-projects/remote-factory/.factory committed in the PR diff. Evidence: git diff main..HEAD -- .factory.bak shows new 120000-mode file; ls -la confirms broken link. Why it matters: Commits a developer's local absolute path into the repo. Broken on all other machines.
- CLAIM 2 CONFIRMED: If a bg session dies without writing state.json, the poll loop spins for the full timeout with no early detection. Evidence: _tmux_persist.py:264-277. _read_session_state (line 196) returns None when state.json is missing; the loop only breaks on a recognized terminal state. Why it matters: Combined with the 24-hour default timeout (Claim 4), a crashed agent blocks the factory for a full day with no signal.
- CLAIM 3 CONFIRMED: _BG_TERMINAL_STATES is hardcoded at _tmux_persist.py:179 with no version note or fallback for unrecognized states. Evidence: line 269 checks state.get("state") in _BG_TERMINAL_STATES. Why it matters: If Claude CLI adds a new terminal state (e.g. "cancelled"), the poll loop spins indefinitely. Should log unrecognized states and document the expected CLI version.
- CLAIM 4 CONFIRMED: timeout is not forwarded from ClaudeRunner.headless() to run_in_background(). Evidence: claude.py:73-80 passes model and dangerously_skip_permissions but omits timeout. run_in_background defaults to _DEFAULT_TMUX_TIMEOUT = 86400.0 (24 hours) at _tmux_persist.py:55,213. Why it matters: This is a functional bug. Agents configured for 600s timeout actually get 24 hours in bg mode.
- CLAIM 5 CONFIRMED: Zero unit tests for _parse_bg_session_id, _read_session_state, or run_in_background. Evidence: grep -rn across tests/ returns no matches. Why it matters: These pure functions are trivially testable. Format changes in Claude CLI output would be caught only at runtime.
- ADDITIONAL: PR description claims "All 2175 tests pass" but test_interactive_task_contains_idea_text fails. Evidence: pytest on both main and PR branch shows the failure at tests/test_cli.py:207. Pre-existing on main, not introduced by this PR. Why it matters: Inaccurate claim in PR description; not a blocker for this PR.
- ADDITIONAL: subprocess.run for "claude stop" at _tmux_persist.py:280 has no timeout. Evidence: the call passes only capture_output=True. Why it matters: If "claude stop" hangs, the cleanup path blocks indefinitely, leaving orphaned sessions.
- ADDITIONAL: FACTORY_BG=1 propagation to sub-agent env (_tmux_persist.py:234-235) means sub-agents also run in bg mode. This is the stated intent, but ceo_completion.py:416-421 bypasses the completion guard respawn loop for bg mode. Sub-agent failures in bg mode will not trigger respawning.
- Acceptance path to flip to KEEP: (1) Remove .factory.bak from the PR. (2) Forward the timeout parameter from claude.py:76-80 to run_in_background. (3) Add a "state.json never appeared" fast-fail: if the session directory exists but state.json is absent after e.g. 60s, return early with an error. (4) Add unit tests for _parse_bg_session_id and _read_session_state. Items 1 and 2 are blocking. Items 3 and 4 can be tracked as follow-up issues if preferred.
Posted by Factory CEO
Summary
--bgflag that dispatches agents as background sessions viaclaude --bg, visible inclaude agents~/.claude/jobs/<id>/state.jsonfor completion and returns output — CEO orchestration loop works as before--bgimplies headless forfactory ceo, skips completion guard respawn loop (single dispatch)FACTORY_BG=1to sub-agent environment so the full fleet appears in agent view--bgCLI flag,FACTORY_BGenv var, orbg = truein~/.factory/config.tomlTest plan
factory agent researcher --task "list files" --project /tmp/test --bgfactory ceo /path --bg --mode discover— single session inclaude agents🤖 Generated with Claude Code