fix: CEO reuses existing GitHub issues instead of creating duplicates#375
fix: CEO reuses existing GitHub issues instead of creating duplicates#375lukeinglis wants to merge 3 commits into
Conversation
Add pre-check in ceo.md step 2c (Improve) and R3a (Research) to parse hypothesis text for issue references and reuse existing open issues instead of always creating new ones. Fixes akashgit#372 Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #375 +/- ##
==========================================
- Coverage 87.31% 87.08% -0.23%
==========================================
Files 61 62 +1
Lines 9339 9682 +343
==========================================
+ Hits 8154 8432 +278
- Misses 1185 1250 +65 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
xukai92
left a comment
There was a problem hiding this comment.
Dedup intent is right and the change is contained, but as-is the Builder will silently follow a stale spec whenever an issue gets reused — that's a meaningful regression risk. Two smaller robustness concerns on the matching pattern and the gh invocation are noted as well. Once the reused issue gets refreshed with the new experiment context, happy to approve.
[P2 — blocking] Reused issue body becomes stale w.r.t. current experiment
factory/agents/prompts/ceo.md around the new pre-check (lines ~1146-1156 and ~2074-2085):
When the pre-check reuses an existing open issue, $ISSUE_NUM is set but the issue body still references whatever $EXP_ID, hypothesis text, acceptance criteria, and (for operational/mixed hypotheses) ## Execution Step / ## Execution Acceptance Criteria it had at original creation. The Builder in step 2d does gh issue view $ISSUE_NUM and is instructed to "implement exactly what the issue describes" — so it will follow the old issue body, not the current CEO-approved hypothesis.
This silently re-decouples the experiment from the work being done: wrong $EXP_ID recorded on the PR, stale acceptance criteria for operational/mixed hypotheses, possibly stale surface constraints in research mode.
Suggested fix — after deciding to reuse, post an update comment with the new experiment context before invoking the Builder:
3. If the issue is open, reuse it: set `$ISSUE_NUM` to that number, post an update comment with the current experiment context, and **skip issue creation**:
```bash
gh issue comment "$ISSUE_NUM" --body "**Updated for experiment $EXP_ID**
## Current hypothesis
<hypothesis text>
## What to build (this cycle)
<specific changes>
## Acceptance criteria (this cycle)
- [ ] <outcomes>
- [ ] Tests pass
- [ ] Eval score does not regress"
```
4. Only create a new issue if no existing open reference is found[P3] #[0-9]+ is too aggressive and has no tie-break for multiple matches
The patterns list narrow forms (Addresses: #NNN, issue #NNN), but step 1 collapses to bare #[0-9]+, which matches "PR #123", "task #4", section anchors, and version tags. There's also no guidance on what to do when multiple #NNN references appear — first match wins by default, which could be the wrong one (e.g., a referenced PR rather than the tracking issue).
Suggested ordering:
- Prefer explicit
Addresses:? #NNN,**Backlog item:** #NNN, orissue #NNNmatches first. - If exactly one bare
#NNNis present and no narrow match was found, fall back to it. - If multiple bare
#NNNreferences remain ambiguous, create a new issue and link the others in the body.
[P3] gh issue view is not scoped to the project repo
The instructions don't require the CEO to cd "$PROJECT_PATH" (or pass -R owner/repo) before the lookup. If the CEO subprocess runs from a different cwd or in a worktree without an upstream, gh issue view may resolve to the wrong repo or fail. The "if open" branch is also ambiguous about what to do on gh non-zero exit — silent fall-through to "no existing issue" is fine but should be stated explicitly.
Suggested fix:
(cd "$PROJECT_PATH" && gh issue view <number> --json state --jq '.state') 2>/dev/nullIf the command fails or returns anything other than OPEN, treat as no existing issue and proceed to create a new one.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
|
Addressed all three review points. Changes applied to both pre-check blocks (Improve step 2c and Research R3a): [P2] Stale issue body: Step 3 now posts a gh issue comment with the current experiment ID, hypothesis, acceptance criteria, and what to build before the Builder is invoked. This ensures the Builder always works from current context even when reusing an old issue. [P3] Pattern matching precision: Replaced the aggressive bare #[0-9]+ collapse with priority-ordered matching:
[P3] Repo-scoped gh issue view: Now runs via (cd "$PROJECT_PATH" && gh issue view ...) 2>/dev/null with explicit failure handling: non-zero exit or non-OPEN state both fall through to new issue creation. |
osilkin98
left a comment
There was a problem hiding this comment.
PASS 1
- [INFO] Approach is reasonable — prompt-only change to parse
#NNNfrom hypothesis text and reuse open issues instead of creating duplicates. Correct layer for this fix. - [HIGH] DRY violation — the exact same ~30-line pre-check block is copy-pasted verbatim in Improve mode (step 2c) and Research mode (R3a). If the logic ever changes, both copies must stay in sync. Should be extracted to a named reference section.
- [LOW] Flow control between "reuse" and "create" paths isn't explicit — no "skip to step 2d" instruction after reuse. The LLM will probably figure it out but explicit flow would be safer.
PASS 2
- [HIGH] Ambiguous regex syntax —
Addresses:? #NNNuses regex notation (?= optional colon) but an LLM might read the?literally. Should be plain English: "Addresses #NNN (with or without a colon)". - [MEDIUM] Fragile bare-
#NNNfallback (rule 1b) — hypothesis text routinely has#NNNrefs to PRs, experiment IDs, Markdown headers. Falling back to a bare#NNNcould hijack unrelated issues. - [MEDIUM] No check that the referenced issue belongs to the current repo —
gh issue viewoperates on whatever repo context exists in$PROJECT_PATH. - [LOW] Missing interaction with
--focusmode — when--focus 42is used, the CEO already targets a specific issue. The pre-check doesn't mention this and could conflict. - [LOW]
2>/dev/nullongh issue viewsuppresses all errors silently — makes debugging reuse failures difficult.
PASS 3
- [HIGH] Poisoned issue references — a strategist/backlog item could include
Addresses #999where issue 999 is unrelated. CEO would reuse it, post a confusing comment, and the Builder implements the wrong thing. No semantic validation that the issue is related. - [MEDIUM] Parallel CEO race — two simultaneous cycles could both parse the same
#NNN, both verify it's open, and both spawn Builders against the same issue, creating conflicting PRs. - [MEDIUM] TOCTOU race — issue could be closed between
gh issue view(step 2) andgh issue comment(step 3). - [LOW]
#NNNinside code blocks or backticks could be false-positive matched — no instruction to ignore quoted references. - [LOW] Edge cases:
#0,#-1,#abc— instructions don't specify "NNN must be a positive integer".
FINAL VERDICT: REQUEST_CHANGES
3 HIGH, 4 MEDIUM, 5 LOW, 1 INFO
The three critical things to fix: (1) deduplicate the 30-line block instead of copy-pasting it, (2) rewrite Addresses:? in plain English instead of regex syntax, and (3) remove or heavily guard the bare #NNN fallback (rule 1b) — it's too aggressive and will cause the CEO to hijack unrelated issues since hypothesis text is full of #NNN references to PRs, experiments, and sections.
Extract the duplicated issue reuse pre-check block (Improve 2c and Research R3a) into a single Issue Reuse Protocol section. Both locations now reference the shared protocol. Replace regex syntax (Addresses:? #NNN) with plain English that an LLM can parse unambiguously. Add semantic validation step: after finding a candidate issue by number, verify the issue title shares keywords with the hypothesis before reusing, preventing hijack of unrelated issues from stray #NNN references. Closes akashgit#466 Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
|
@osilkin98 Thanks for the thorough review. All three HIGHs are addressed: [HIGH] DRY violation: Extracted the pre-check block into a single ### Issue Reuse Protocol section. Both Improve 2c and Research R3a now reference it with a one-liner. Net -25 lines. [HIGH] Regex syntax: Addresses:? #NNN rewritten as Addresses #NNN (with or without a trailing colon). [HIGH] Poisoned references: Added a keyword validation step: after confirming the issue is open, check that at least one keyword from the hypothesis appears in the issue title. Scoped to bare #NNN fallbacks only (rule 1b). Explicit references like Addresses #42 skip the check since they signal clear authorial intent. For the MEDIUMs: repo scoping and bare #NNN fallback were already addressed in the earlier round (xukai92's review). Parallel CEO race and TOCTOU are unlikely edge cases for a prompt file and I'd prefer not to add more complexity here. |
|
@xukai92 Following up on your suggestion to solve this in code rather than the prompt. I traced the issue creation flow and found that the CEO agent is the sole creator of GitHub issues (via I put together a code-level alternative in PR #493. The approach:
The 37-line Issue Reuse Protocol in ceo.md would shrink to a 4-line shell snippet: ```bash 14 unit tests cover the parsing and resolution logic. Would appreciate your review on the approach in #493. If it looks right, this PR can be superseded by that one with a minimal prompt change to call the new command.' |
|
Triage: holding this pending review of #493 (the code-level alternative). Open question on #493: explicit #NNN reference resolution may not be enough for real dedup — the duplicate-issue problem usually needs similarity matching, not just reference parsing. Whichever approach survives, this one stays as fallback until then. |
Summary
ceo.mdstep 2c (Improve mode) and R3a (Research mode) to parse hypothesis text for issue references (#NNN)Fixes #372