Skip to content

fix: CEO reuses existing GitHub issues instead of creating duplicates#375

Open
lukeinglis wants to merge 3 commits into
akashgit:mainfrom
lukeinglis:fix/ceo-issue-dedup
Open

fix: CEO reuses existing GitHub issues instead of creating duplicates#375
lukeinglis wants to merge 3 commits into
akashgit:mainfrom
lukeinglis:fix/ceo-issue-dedup

Conversation

@lukeinglis

Copy link
Copy Markdown
Collaborator

Summary

  • Add pre-check in ceo.md step 2c (Improve mode) and R3a (Research mode) to parse hypothesis text for issue references (#NNN)
  • If an existing open issue is found, reuse it instead of creating a duplicate
  • Prompt-only change, no code modifications

Fixes #372

Add pre-check in ceo.md step 2c (Improve) and R3a (Research) to parse
hypothesis text for issue references and reuse existing open issues
instead of always creating new ones.

Fixes akashgit#372

Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@codecov

codecov Bot commented May 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.08%. Comparing base (8097461) to head (3ca8dd7).
⚠️ Report is 83 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #375      +/-   ##
==========================================
- Coverage   87.31%   87.08%   -0.23%     
==========================================
  Files          61       62       +1     
  Lines        9339     9682     +343     
==========================================
+ Hits         8154     8432     +278     
- Misses       1185     1250      +65     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xukai92 xukai92 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dedup intent is right and the change is contained, but as-is the Builder will silently follow a stale spec whenever an issue gets reused — that's a meaningful regression risk. Two smaller robustness concerns on the matching pattern and the gh invocation are noted as well. Once the reused issue gets refreshed with the new experiment context, happy to approve.

[P2 — blocking] Reused issue body becomes stale w.r.t. current experiment

factory/agents/prompts/ceo.md around the new pre-check (lines ~1146-1156 and ~2074-2085):

When the pre-check reuses an existing open issue, $ISSUE_NUM is set but the issue body still references whatever $EXP_ID, hypothesis text, acceptance criteria, and (for operational/mixed hypotheses) ## Execution Step / ## Execution Acceptance Criteria it had at original creation. The Builder in step 2d does gh issue view $ISSUE_NUM and is instructed to "implement exactly what the issue describes" — so it will follow the old issue body, not the current CEO-approved hypothesis.

This silently re-decouples the experiment from the work being done: wrong $EXP_ID recorded on the PR, stale acceptance criteria for operational/mixed hypotheses, possibly stale surface constraints in research mode.

Suggested fix — after deciding to reuse, post an update comment with the new experiment context before invoking the Builder:

3. If the issue is open, reuse it: set `$ISSUE_NUM` to that number, post an update comment with the current experiment context, and **skip issue creation**:
   ```bash
   gh issue comment "$ISSUE_NUM" --body "**Updated for experiment $EXP_ID**

   ## Current hypothesis
   <hypothesis text>

   ## What to build (this cycle)
   <specific changes>

   ## Acceptance criteria (this cycle)
   - [ ] <outcomes>
   - [ ] Tests pass
   - [ ] Eval score does not regress"
   ```
4. Only create a new issue if no existing open reference is found

[P3] #[0-9]+ is too aggressive and has no tie-break for multiple matches

The patterns list narrow forms (Addresses: #NNN, issue #NNN), but step 1 collapses to bare #[0-9]+, which matches "PR #123", "task #4", section anchors, and version tags. There's also no guidance on what to do when multiple #NNN references appear — first match wins by default, which could be the wrong one (e.g., a referenced PR rather than the tracking issue).

Suggested ordering:

  1. Prefer explicit Addresses:? #NNN, **Backlog item:** #NNN, or issue #NNN matches first.
  2. If exactly one bare #NNN is present and no narrow match was found, fall back to it.
  3. If multiple bare #NNN references remain ambiguous, create a new issue and link the others in the body.

[P3] gh issue view is not scoped to the project repo

The instructions don't require the CEO to cd "$PROJECT_PATH" (or pass -R owner/repo) before the lookup. If the CEO subprocess runs from a different cwd or in a worktree without an upstream, gh issue view may resolve to the wrong repo or fail. The "if open" branch is also ambiguous about what to do on gh non-zero exit — silent fall-through to "no existing issue" is fine but should be stated explicitly.

Suggested fix:

(cd "$PROJECT_PATH" && gh issue view <number> --json state --jq '.state') 2>/dev/null

If the command fails or returns anything other than OPEN, treat as no existing issue and proceed to create a new one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@lukeinglis

Copy link
Copy Markdown
Collaborator Author

Addressed all three review points. Changes applied to both pre-check blocks (Improve step 2c and Research R3a):

[P2] Stale issue body: Step 3 now posts a gh issue comment with the current experiment ID, hypothesis, acceptance criteria, and what to build before the Builder is invoked. This ensures the Builder always works from current context even when reusing an old issue.

[P3] Pattern matching precision: Replaced the aggressive bare #[0-9]+ collapse with priority-ordered matching:

  1. Explicit references first: Addresses:? #NNN, Backlog item: #NNN, issue #NNN
  2. Fallback to bare #NNN only if exactly one match and no explicit match found
  3. Multiple ambiguous bare refs: skip reuse, create new issue and link the others

[P3] Repo-scoped gh issue view: Now runs via (cd "$PROJECT_PATH" && gh issue view ...) 2>/dev/null with explicit failure handling: non-zero exit or non-OPEN state both fall through to new issue creation.

@akashgit akashgit added the bug Something isn't working label May 30, 2026

@osilkin98 osilkin98 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PASS 1

  1. [INFO] Approach is reasonable — prompt-only change to parse #NNN from hypothesis text and reuse open issues instead of creating duplicates. Correct layer for this fix.
  2. [HIGH] DRY violation — the exact same ~30-line pre-check block is copy-pasted verbatim in Improve mode (step 2c) and Research mode (R3a). If the logic ever changes, both copies must stay in sync. Should be extracted to a named reference section.
  3. [LOW] Flow control between "reuse" and "create" paths isn't explicit — no "skip to step 2d" instruction after reuse. The LLM will probably figure it out but explicit flow would be safer.

PASS 2

  1. [HIGH] Ambiguous regex syntaxAddresses:? #NNN uses regex notation (? = optional colon) but an LLM might read the ? literally. Should be plain English: "Addresses #NNN (with or without a colon)".
  2. [MEDIUM] Fragile bare-#NNN fallback (rule 1b) — hypothesis text routinely has #NNN refs to PRs, experiment IDs, Markdown headers. Falling back to a bare #NNN could hijack unrelated issues.
  3. [MEDIUM] No check that the referenced issue belongs to the current repo — gh issue view operates on whatever repo context exists in $PROJECT_PATH.
  4. [LOW] Missing interaction with --focus mode — when --focus 42 is used, the CEO already targets a specific issue. The pre-check doesn't mention this and could conflict.
  5. [LOW] 2>/dev/null on gh issue view suppresses all errors silently — makes debugging reuse failures difficult.

PASS 3

  1. [HIGH] Poisoned issue references — a strategist/backlog item could include Addresses #999 where issue 999 is unrelated. CEO would reuse it, post a confusing comment, and the Builder implements the wrong thing. No semantic validation that the issue is related.
  2. [MEDIUM] Parallel CEO race — two simultaneous cycles could both parse the same #NNN, both verify it's open, and both spawn Builders against the same issue, creating conflicting PRs.
  3. [MEDIUM] TOCTOU race — issue could be closed between gh issue view (step 2) and gh issue comment (step 3).
  4. [LOW] #NNN inside code blocks or backticks could be false-positive matched — no instruction to ignore quoted references.
  5. [LOW] Edge cases: #0, #-1, #abc — instructions don't specify "NNN must be a positive integer".

FINAL VERDICT: REQUEST_CHANGES

3 HIGH, 4 MEDIUM, 5 LOW, 1 INFO

The three critical things to fix: (1) deduplicate the 30-line block instead of copy-pasting it, (2) rewrite Addresses:? in plain English instead of regex syntax, and (3) remove or heavily guard the bare #NNN fallback (rule 1b) — it's too aggressive and will cause the CEO to hijack unrelated issues since hypothesis text is full of #NNN references to PRs, experiments, and sections.

Extract the duplicated issue reuse pre-check block (Improve 2c and
Research R3a) into a single Issue Reuse Protocol section. Both
locations now reference the shared protocol.

Replace regex syntax (Addresses:? #NNN) with plain English that an
LLM can parse unambiguously.

Add semantic validation step: after finding a candidate issue by
number, verify the issue title shares keywords with the hypothesis
before reusing, preventing hijack of unrelated issues from stray
#NNN references.

Closes akashgit#466

Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@lukeinglis

Copy link
Copy Markdown
Collaborator Author

@osilkin98 Thanks for the thorough review. All three HIGHs are addressed:

[HIGH] DRY violation: Extracted the pre-check block into a single ### Issue Reuse Protocol section. Both Improve 2c and Research R3a now reference it with a one-liner. Net -25 lines.

[HIGH] Regex syntax: Addresses:? #NNN rewritten as Addresses #NNN (with or without a trailing colon).

[HIGH] Poisoned references: Added a keyword validation step: after confirming the issue is open, check that at least one keyword from the hypothesis appears in the issue title. Scoped to bare #NNN fallbacks only (rule 1b). Explicit references like Addresses #42 skip the check since they signal clear authorial intent.

For the MEDIUMs: repo scoping and bare #NNN fallback were already addressed in the earlier round (xukai92's review). Parallel CEO race and TOCTOU are unlikely edge cases for a prompt file and I'd prefer not to add more complexity here.

@lukeinglis

Copy link
Copy Markdown
Collaborator Author

@xukai92 Following up on your suggestion to solve this in code rather than the prompt. I traced the issue creation flow and found that the CEO agent is the sole creator of GitHub issues (via gh issue create in step 2c). No other agent or Python code creates issues.

I put together a code-level alternative in PR #493. The approach:

  1. extract_issue_refs() in factory/issue.py: parses hypothesis text for issue references using the same priority ordering from this PR (explicit Addresses #N > **Backlog item:** #N> bare #N), but in tested Python regex instead of prompt instructions
  2. resolve_reusable_issue(): verifies the issue is open and relevant (keyword match against title, same poisoned-reference guard from review round 3)
  3. factory issue-resolve CLI command: the CEO calls this before gh issue create. Prints the reusable issue number or nothing.

The 37-line Issue Reuse Protocol in ceo.md would shrink to a 4-line shell snippet: ```bash
EXISTING_ISSUE=$(factory issue-resolve "$PROJECT_PATH" --hypothesis "") if [ -n "$EXISTING_ISSUE" ]; then
ISSUE_NUM=$EXISTING_ISSUE else ISSUE_NUM=$(gh issue create ...)
fi

14 unit tests cover the parsing and resolution logic. Would appreciate your review on the approach in #493. If it looks right, this PR can be superseded by that one with a minimal prompt change to call the new command.'

@osilkin98

Copy link
Copy Markdown
Collaborator

Triage: holding this pending review of #493 (the code-level alternative). Open question on #493: explicit #NNN reference resolution may not be enough for real dedup — the duplicate-issue problem usually needs similarity matching, not just reference parsing. Whichever approach survives, this one stays as fallback until then.

@osilkin98 osilkin98 added competing Another open PR solves the same problem kind:fix Something was wrong stage:judgment Deciding what is true/kept: eval, gates, review, state detection labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working competing Another open PR solves the same problem kind:fix Something was wrong stage:judgment Deciding what is true/kept: eval, gates, review, state detection

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CEO should reuse existing GitHub issues instead of creating duplicates

4 participants