[DO NOT MERGE] Investigate pocket vetoes (#51) by jeffreyksmithjr · Pull Request #52 · 2ndSetAI/good-egg

jeffreyksmithjr · 2026-04-12T17:55:31Z

Summary

Investigation for #51: does counting stale open PRs as implicit rejections improve the merge_rate signal? Ran two passes over the bot_detection DuckDB. Short answer: no, not for detection accuracy. Not merging this; it is an evidence trail for closing the issue.

What was done

Pass 1: age-since-created_at as the staleness proxy (forced by the DuckDB schema, which has no updatedAt). Computed three merge-rate definitions over all 31,296 authors: v3 baseline, age_universal (90d), age_per_repo (using the DB's per-repo stale_threshold_days). Ran 5-fold CV on a 2-feature logistic (merge_rate + log1p(median_additions)) against the 675-suspended / 30,554-active labeled subset.
Pass 2: re-fetched updatedAt for every DB-OPEN PR via GraphQL (91 repos, one paginated call each, cached to open_pr_activity.parquet) and added two idle-time variants. This addressed the main objection to Pass 1: under age-based staleness, a legitimate contributor like harupy (2,771 merged PRs, active account) gets punished for open PRs that are in active discussion. Idle-time correctly spares them.

Results

Definition	CV AUC	Cohen's d	Authors shifted >0.10
v3 baseline	0.5494	0.096	n/a
age_universal (90d)	0.5488	0.081	1,836
age_per_repo	0.5486	0.073	2,111
idle_universal (90d)	0.5489	0.087	459
idle_per_repo	0.5488	0.085	529

The refinement worked mechanically. Idle-time reclassifies 4x fewer authors than age-based, exactly the gentler behavior we wanted. But none of the variants beat the v3 baseline on CV AUC or Cohen's d, and differences are within noise.

Why it doesn't work

Suspended high-volume accounts in this dataset tend to either (a) be bots that v3 already scores low, or (b) have no open PRs at all (e.g. amd-jmacaran: 105 PRs, merge_rate=1.000 under every definition). Pocket vetoes aren't catching a failure mode the current model misses. They're just redistributing mass among accounts the model already classifies adequately.

A striking side-finding from Pass 2: 79% of DB-OPEN PRs have since been closed or merged (15,433 of 19,502). The age-based proxy treated all 19,502 as stale; idle-time correctly treats those 15,433 as non-stale. The age-based result was partly an artifact of the crude proxy, and the "is idle-time better" question was worth answering even though the final verdict didn't change.

Caveat worth recording for the future

For the narrow detection task, this is settled, don't ship. For merge_rate as a human-facing interpretable signal, idle_per_repo is still the most defensible definition. It correctly handles the drive-by-with-ignored-PR case (currently scored as merge_rate=1.000, which reads as "always accepted" but isn't true). That's an interpretability case, not an ML case, and a separate conversation.

Files

experiments/bot_detection/scripts/pocket_veto_analysis.py: analysis script
experiments/bot_detection/scripts/fetch_open_pr_activity.py: one-shot GraphQL refetch for updatedAt
experiments/bot_detection/data/open_pr_activity.parquet: cached refetch output (58KB)
experiments/bot_detection/data/results/pocket_veto_analysis.json: machine-readable results
experiments/bot_detection/pocket_veto_findings.md: human-readable writeup

No changes to src/good_egg/.

Test plan

N/A, investigation only, no production code changes
uv run python experiments/bot_detection/scripts/pocket_veto_analysis.py reproduces the reported numbers
uv run ruff check experiments/bot_detection/scripts/ passes

Two-pass analysis over the bot_detection DuckDB to see whether counting stale open PRs as rejections improves merge_rate signal quality against suspended-vs-active ground truth. Pass 1 used age-since-created_at as the staleness proxy (forced by the DuckDB schema). Negative result: CV AUC flat, Cohen's d worse. Pass 2 re-fetched updatedAt for every DB-OPEN PR (4,069 still currently open of 19,502; the other 79% have been closed or merged since the snapshot and are by definition non-stale) and recomputed staleness as idle-time. Much gentler reclassification — only 459 authors shift by >0.10 under idle_universal vs 1,836 under age_universal — but signal quality is still flat (CV AUC 0.5489 vs baseline 0.5494). Conclusion: idle-time is the right signal, but pocket vetoes don't catch a failure mode the current bad-egg model misses. No scoring change recommended on detection-accuracy grounds. Interpretability case for the change is separately defensible but out of scope here.

github-actions · 2026-04-12T17:56:01Z

🥚 Diet Egg: HIGH Trust

Score: 83%

Score Breakdown

Component	Value
Merge Rate	83% (48/58 PRs)

Top Contributions

Repository	PRs	Language	Stars
2ndSetAI/good-egg	21	Python	29
jeffreyksmithjr/verskyt	9	Python	3
jeffreyksmithjr/galapagos_nao	7	Elixir	21
aws-samples/aws-big-data-blog	3	Java	894
pytorch/pytorch.github.io	2	HTML	279
elixir-nx/nx	1	Elixir	2874
numpy/numpy	1	Python	31836
melissawm/open-source-ai-contribution-policies	1	N/A	155
nerves-project/nerves_examples	1	Elixir	404
kilimchoi/engineering-blogs	1	Ruby	37735

gemini-code-assist

Code Review

This pull request implements an investigation into the "pocket veto" effect, analyzing if treating stale open pull requests as rejections enhances the signal for identifying suspended accounts. It adds scripts for fetching PR activity and performing statistical analysis, alongside the generated findings. Feedback suggests optimizing database interactions in the fetching script to avoid redundant connections and N+1 queries, and adding a check for empty data to prevent potential KeyError exceptions during DataFrame processing.

gemini-code-assist · 2026-04-12T17:57:27Z

+    con = duckdb.connect(str(DB_PATH), read_only=True)
+    repo_counts = con.execute("""
+        SELECT repo, COUNT(*) AS n
+        FROM prs WHERE state='OPEN'
+        GROUP BY repo ORDER BY n DESC
+    """).fetchall()
+    con.close()
+
+    db_open_keys: dict[str, set[int]] = {}
+    con = duckdb.connect(str(DB_PATH), read_only=True)
+    for (repo,) in con.execute(
+        "SELECT DISTINCT repo FROM prs WHERE state='OPEN'"
+    ).fetchall():
+        numbers = con.execute(
+            "SELECT number FROM prs WHERE state='OPEN' AND repo=?", [repo]
+        ).fetchall()
+        db_open_keys[repo] = {n[0] for n in numbers}
+    con.close()


The current implementation opens and closes the database connection twice and uses an N+1 query pattern to populate db_open_keys. This can be optimized by using a single connection and a single query to fetch all open PR numbers, then grouping them in memory. This reduces overhead and improves script performance.

con = duckdb.connect(str(DB_PATH), read_only=True) repo_counts = con.execute(""" SELECT repo, COUNT(*) AS n FROM prs WHERE state='OPEN' GROUP BY repo ORDER BY n DESC """).fetchall() db_open_keys: dict[str, set[int]] = {} all_open = con.execute("SELECT repo, number FROM prs WHERE state='OPEN'").fetchall() for repo, num in all_open: db_open_keys.setdefault(repo, set()).add(num) con.close()

gemini-code-assist · 2026-04-12T17:57:27Z

+    df = pd.DataFrame(rows)
+    df["updated_at"] = pd.to_datetime(df["updated_at"])
+    df["fetch_now"] = pd.to_datetime(df["fetch_now"])


If no relevant PR activity is found (i.e., the rows list is empty), pd.DataFrame(rows) will create a DataFrame without the expected columns. This will cause the subsequent pd.to_datetime calls to raise a KeyError. Adding a guard clause to handle the empty case will make the script more robust.

Suggested change

df = pd.DataFrame(rows)

df["updated_at"] = pd.to_datetime(df["updated_at"])

df["fetch_now"] = pd.to_datetime(df["fetch_now"])

if not rows:

print("No relevant PR activity found.")

return

df = pd.DataFrame(rows)

df["updated_at"] = pd.to_datetime(df["updated_at"])

df["fetch_now"] = pd.to_datetime(df["fetch_now"])

jeffreyksmithjr mentioned this pull request Apr 12, 2026

Investigate pocket vetoes (open PRs as implicit rejections) in merge rate #51

Closed

gemini-code-assist bot reviewed Apr 12, 2026

View reviewed changes

jeffreyksmithjr closed this Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] Investigate pocket vetoes (#51)#52

[DO NOT MERGE] Investigate pocket vetoes (#51)#52
jeffreyksmithjr wants to merge 1 commit intomainfrom
pocket-vetos

jeffreyksmithjr commented Apr 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 12, 2026

Uh oh!

gemini-code-assist bot Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeffreyksmithjr commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What was done

Results

Why it doesn't work

Caveat worth recording for the future

Files

Test plan

Uh oh!

github-actions bot commented Apr 12, 2026

🥚 Diet Egg: HIGH Trust

Score Breakdown

Top Contributions

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jeffreyksmithjr commented Apr 12, 2026 •

edited

Loading