fix: CEO reuses existing GitHub issues instead of creating duplicates by lukeinglis · Pull Request #375 · akashgit/remote-factory

lukeinglis · 2026-05-25T20:15:53Z

Summary

Add pre-check in ceo.md step 2c (Improve mode) and R3a (Research mode) to parse hypothesis text for issue references (#NNN)
If an existing open issue is found, reuse it instead of creating a duplicate
Prompt-only change, no code modifications

Fixes #372

Add pre-check in ceo.md step 2c (Improve) and R3a (Research) to parse hypothesis text for issue references and reuse existing open issues instead of always creating new ones. Fixes akashgit#372 Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>

codecov · 2026-05-25T20:17:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.08%. Comparing base (8097461) to head (3ca8dd7).
⚠️ Report is 83 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #375      +/-   ##
==========================================
- Coverage   87.31%   87.08%   -0.23%     
==========================================
  Files          61       62       +1     
  Lines        9339     9682     +343     
==========================================
+ Hits         8154     8432     +278     
- Misses       1185     1250      +65

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xukai92

Dedup intent is right and the change is contained, but as-is the Builder will silently follow a stale spec whenever an issue gets reused — that's a meaningful regression risk. Two smaller robustness concerns on the matching pattern and the gh invocation are noted as well. Once the reused issue gets refreshed with the new experiment context, happy to approve.

[P2 — blocking] Reused issue body becomes stale w.r.t. current experiment

factory/agents/prompts/ceo.md around the new pre-check (lines ~1146-1156 and ~2074-2085):

When the pre-check reuses an existing open issue, $ISSUE_NUM is set but the issue body still references whatever $EXP_ID, hypothesis text, acceptance criteria, and (for operational/mixed hypotheses) ## Execution Step / ## Execution Acceptance Criteria it had at original creation. The Builder in step 2d does gh issue view $ISSUE_NUM and is instructed to "implement exactly what the issue describes" — so it will follow the old issue body, not the current CEO-approved hypothesis.

This silently re-decouples the experiment from the work being done: wrong $EXP_ID recorded on the PR, stale acceptance criteria for operational/mixed hypotheses, possibly stale surface constraints in research mode.

Suggested fix — after deciding to reuse, post an update comment with the new experiment context before invoking the Builder:

3. If the issue is open, reuse it: set `$ISSUE_NUM` to that number, post an update comment with the current experiment context, and **skip issue creation**:
   ```bash
   gh issue comment "$ISSUE_NUM" --body "**Updated for experiment $EXP_ID**

   ## Current hypothesis
   <hypothesis text>

   ## What to build (this cycle)
   <specific changes>

   ## Acceptance criteria (this cycle)
   - [ ] <outcomes>
   - [ ] Tests pass
   - [ ] Eval score does not regress"
   ```
4. Only create a new issue if no existing open reference is found

[P3] `#[0-9]+` is too aggressive and has no tie-break for multiple matches

The patterns list narrow forms (Addresses: #NNN, issue #NNN), but step 1 collapses to bare #[0-9]+, which matches "PR #123", "task #4", section anchors, and version tags. There's also no guidance on what to do when multiple #NNN references appear — first match wins by default, which could be the wrong one (e.g., a referenced PR rather than the tracking issue).

Suggested ordering:

Prefer explicit Addresses:? #NNN, **Backlog item:** #NNN, or issue #NNN matches first.
If exactly one bare #NNN is present and no narrow match was found, fall back to it.
If multiple bare #NNN references remain ambiguous, create a new issue and link the others in the body.

[P3] `gh issue view` is not scoped to the project repo

The instructions don't require the CEO to cd "$PROJECT_PATH" (or pass -R owner/repo) before the lookup. If the CEO subprocess runs from a different cwd or in a worktree without an upstream, gh issue view may resolve to the wrong repo or fail. The "if open" branch is also ambiguous about what to do on gh non-zero exit — silent fall-through to "no existing issue" is fine but should be stated explicitly.

Suggested fix:

(cd "$PROJECT_PATH" && gh issue view <number> --json state --jq '.state') 2>/dev/null

If the command fails or returns anything other than OPEN, treat as no existing issue and proceed to create a new one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>

lukeinglis · 2026-05-28T02:57:41Z

Addressed all three review points. Changes applied to both pre-check blocks (Improve step 2c and Research R3a):

[P2] Stale issue body: Step 3 now posts a gh issue comment with the current experiment ID, hypothesis, acceptance criteria, and what to build before the Builder is invoked. This ensures the Builder always works from current context even when reusing an old issue.

[P3] Pattern matching precision: Replaced the aggressive bare #[0-9]+ collapse with priority-ordered matching:

Explicit references first: Addresses:? #NNN, Backlog item: #NNN, issue #NNN
Fallback to bare #NNN only if exactly one match and no explicit match found
Multiple ambiguous bare refs: skip reuse, create new issue and link the others

[P3] Repo-scoped gh issue view: Now runs via (cd "$PROJECT_PATH" && gh issue view ...) 2>/dev/null with explicit failure handling: non-zero exit or non-OPEN state both fall through to new issue creation.

osilkin98

PASS 1

[INFO] Approach is reasonable — prompt-only change to parse #NNN from hypothesis text and reuse open issues instead of creating duplicates. Correct layer for this fix.
[HIGH] DRY violation — the exact same ~30-line pre-check block is copy-pasted verbatim in Improve mode (step 2c) and Research mode (R3a). If the logic ever changes, both copies must stay in sync. Should be extracted to a named reference section.
[LOW] Flow control between "reuse" and "create" paths isn't explicit — no "skip to step 2d" instruction after reuse. The LLM will probably figure it out but explicit flow would be safer.

PASS 2

[HIGH] Ambiguous regex syntax — Addresses:? #NNN uses regex notation (? = optional colon) but an LLM might read the ? literally. Should be plain English: "Addresses #NNN (with or without a colon)".
[MEDIUM] Fragile bare-#NNN fallback (rule 1b) — hypothesis text routinely has #NNN refs to PRs, experiment IDs, Markdown headers. Falling back to a bare #NNN could hijack unrelated issues.
[MEDIUM] No check that the referenced issue belongs to the current repo — gh issue view operates on whatever repo context exists in $PROJECT_PATH.
[LOW] Missing interaction with --focus mode — when --focus 42 is used, the CEO already targets a specific issue. The pre-check doesn't mention this and could conflict.
[LOW] 2>/dev/null on gh issue view suppresses all errors silently — makes debugging reuse failures difficult.

PASS 3

[HIGH] Poisoned issue references — a strategist/backlog item could include Addresses #999 where issue 999 is unrelated. CEO would reuse it, post a confusing comment, and the Builder implements the wrong thing. No semantic validation that the issue is related.
[MEDIUM] Parallel CEO race — two simultaneous cycles could both parse the same #NNN, both verify it's open, and both spawn Builders against the same issue, creating conflicting PRs.
[MEDIUM] TOCTOU race — issue could be closed between gh issue view (step 2) and gh issue comment (step 3).
[LOW] #NNN inside code blocks or backticks could be false-positive matched — no instruction to ignore quoted references.
[LOW] Edge cases: #0, #-1, #abc — instructions don't specify "NNN must be a positive integer".

FINAL VERDICT: REQUEST_CHANGES

3 HIGH, 4 MEDIUM, 5 LOW, 1 INFO

The three critical things to fix: (1) deduplicate the 30-line block instead of copy-pasting it, (2) rewrite Addresses:? in plain English instead of regex syntax, and (3) remove or heavily guard the bare #NNN fallback (rule 1b) — it's too aggressive and will cause the CEO to hijack unrelated issues since hypothesis text is full of #NNN references to PRs, experiments, and sections.

Extract the duplicated issue reuse pre-check block (Improve 2c and Research R3a) into a single Issue Reuse Protocol section. Both locations now reference the shared protocol. Replace regex syntax (Addresses:? #NNN) with plain English that an LLM can parse unambiguously. Add semantic validation step: after finding a candidate issue by number, verify the issue title shares keywords with the hypothesis before reusing, preventing hijack of unrelated issues from stray #NNN references. Closes akashgit#466 Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>

lukeinglis · 2026-06-04T19:06:13Z

@osilkin98 Thanks for the thorough review. All three HIGHs are addressed:

[HIGH] DRY violation: Extracted the pre-check block into a single ### Issue Reuse Protocol section. Both Improve 2c and Research R3a now reference it with a one-liner. Net -25 lines.

[HIGH] Regex syntax: Addresses:? #NNN rewritten as Addresses #NNN (with or without a trailing colon).

[HIGH] Poisoned references: Added a keyword validation step: after confirming the issue is open, check that at least one keyword from the hypothesis appears in the issue title. Scoped to bare #NNN fallbacks only (rule 1b). Explicit references like Addresses #42 skip the check since they signal clear authorial intent.

For the MEDIUMs: repo scoping and bare #NNN fallback were already addressed in the earlier round (xukai92's review). Parallel CEO race and TOCTOU are unlikely edge cases for a prompt file and I'd prefer not to add more complexity here.

lukeinglis · 2026-06-05T20:15:51Z

@xukai92 Following up on your suggestion to solve this in code rather than the prompt. I traced the issue creation flow and found that the CEO agent is the sole creator of GitHub issues (via gh issue create in step 2c). No other agent or Python code creates issues.

I put together a code-level alternative in PR #493. The approach:

extract_issue_refs() in factory/issue.py: parses hypothesis text for issue references using the same priority ordering from this PR (explicit Addresses #N > **Backlog item:** #N> bare #N), but in tested Python regex instead of prompt instructions
resolve_reusable_issue(): verifies the issue is open and relevant (keyword match against title, same poisoned-reference guard from review round 3)
factory issue-resolve CLI command: the CEO calls this before gh issue create. Prints the reusable issue number or nothing.

The 37-line Issue Reuse Protocol in ceo.md would shrink to a 4-line shell snippet: ```bash
EXISTING_ISSUE=$(factory issue-resolve "$PROJECT_PATH" --hypothesis "") if [ -n "$EXISTING_ISSUE" ]; then
ISSUE_NUM=$EXISTING_ISSUE else ISSUE_NUM=$(gh issue create ...)
fi

14 unit tests cover the parsing and resolution logic. Would appreciate your review on the approach in #493. If it looks right, this PR can be superseded by that one with a minimal prompt change to call the new command.'

osilkin98 · 2026-06-11T02:59:31Z

Triage: holding this pending review of #493 (the code-level alternative). Open question on #493: explicit #NNN reference resolution may not be enough for real dedup — the duplicate-issue problem usually needs similarity matching, not just reference parsing. Whichever approach survives, this one stays as fallback until then.

xukai92 requested changes May 28, 2026

View reviewed changes

fix: address review feedback on issue dedup pre-check

fc1a0a4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>

akashgit added the bug Something isn't working label May 30, 2026

osilkin98 requested changes Jun 4, 2026

View reviewed changes

lukeinglis mentioned this pull request Jun 4, 2026

fix: address 3 HIGH review items on PR #375 (issue dedup) #466

Closed

4 tasks

lukeinglis mentioned this pull request Jun 5, 2026

feat: code-level issue dedup via factory issue-resolve #493

Open

4 tasks

osilkin98 added competing Another open PR solves the same problem kind:fix Something was wrong stage:judgment Deciding what is true/kept: eval, gates, review, state detection labels Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: CEO reuses existing GitHub issues instead of creating duplicates#375

fix: CEO reuses existing GitHub issues instead of creating duplicates#375
lukeinglis wants to merge 3 commits into
akashgit:mainfrom
lukeinglis:fix/ceo-issue-dedup

lukeinglis commented May 25, 2026

Uh oh!

codecov Bot commented May 25, 2026 •

edited

Loading

Uh oh!

xukai92 left a comment

Uh oh!

lukeinglis commented May 28, 2026

Uh oh!

osilkin98 left a comment

Uh oh!

lukeinglis commented Jun 4, 2026

Uh oh!

lukeinglis commented Jun 5, 2026

Uh oh!

osilkin98 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lukeinglis commented May 25, 2026

Summary

Uh oh!

codecov Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xukai92 left a comment

Choose a reason for hiding this comment

[P2 — blocking] Reused issue body becomes stale w.r.t. current experiment

[P3] #[0-9]+ is too aggressive and has no tie-break for multiple matches

[P3] gh issue view is not scoped to the project repo

Uh oh!

lukeinglis commented May 28, 2026

Uh oh!

osilkin98 left a comment

Choose a reason for hiding this comment

PASS 1

PASS 2

PASS 3

FINAL VERDICT: REQUEST_CHANGES

Uh oh!

lukeinglis commented Jun 4, 2026

Uh oh!

lukeinglis commented Jun 5, 2026

Uh oh!

osilkin98 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented May 25, 2026 •

edited

Loading

[P3] `#[0-9]+` is too aggressive and has no tie-break for multiple matches

[P3] `gh issue view` is not scoped to the project repo