feat: add weekly release notes generator script#21
Conversation
Scans merged PRs across all HydraDB repos (cortex-application, cortex-ingestion, cortex-dashboard, hydradb-on-prem-infra, hydradb-cli, mintlify-docs, and others) for a configurable time window. Features: - Automatic PR categorization (features, fixes, perf, security, etc.) - Contributor stats - Optional AI summarization via OpenAI (--dry-run to skip) - Outputs markdown to reports/release-notes-YYYY-MM-DD.md Usage: python generate_release_notes.py --days 7
Greptile SummaryThis PR adds Two correctness defects need attention before this produces reliable output:
Confidence Score: 4/5Safe to merge with caveats — the script is additive and not in any production path, but two correctness bugs mean the output will have wrong contributor counts and may silently omit PRs on busy repos. Two P1 findings: (1) bot detection always returns False, causing bot PRs to inflate human contributor stats and the bot summary line to never render; (2) the hardcoded --limit 100 without a server-side date filter silently drops PRs on high-velocity repos. The script is standalone and write-only so these bugs don't cascade, but they do produce incorrect release notes. generate_release_notes.py — specifically the fetch_merged_prs function (limit/search filter) and the contributor stats block (bot detection field name). Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Script as generate_release_notes.py
participant GH as gh CLI / GitHub API
participant OAI as OpenAI API
participant FS as reports/ directory
User->>Script: python generate_release_notes.py --days 7
loop For each of 13 repos
Script->>GH: gh pr list --state merged --limit 100 --json ...
GH-->>Script: list of PRs (up to 100)
Script->>Script: filter by merged_at >= since (client-side)
end
Script->>Script: categorize PRs by title keywords
alt not --dry-run and OPENAI_API_KEY set
Script->>OAI: chat.completions.create(gpt-4o-mini)
OAI-->>Script: executive summary text
end
Script->>Script: build markdown (categories, contributors, bot count)
Script->>FS: write release-notes-YYYY-MM-DD.md
FS-->>User: file path printed to stdout
Reviews (1): Last reviewed commit: "feat: add weekly release notes generator..." | Re-trigger Greptile |
| cmd = [ | ||
| "gh", "pr", "list", | ||
| "--repo", f"{owner}/{name}", | ||
| "--state", "merged", | ||
| "--json", "number,title,author,mergedAt,url,body,labels", | ||
| "--limit", "100", | ||
| ] |
There was a problem hiding this comment.
--limit 100 silently drops PRs on active repos
gh pr list returns the 100 most recently merged PRs and date filtering happens client-side. For any repo that merged more than 100 PRs within the lookback window the script silently omits the oldest ones — there is no warning or error. For cortex-application or cortex-ingestion during a busy sprint this is a real risk.
Adding a server-side date filter via --search and raising the limit eliminates the gap:
| cmd = [ | |
| "gh", "pr", "list", | |
| "--repo", f"{owner}/{name}", | |
| "--state", "merged", | |
| "--json", "number,title,author,mergedAt,url,body,labels", | |
| "--limit", "100", | |
| ] | |
| cmd = [ | |
| "gh", "pr", "list", | |
| "--repo", f"{owner}/{name}", | |
| "--state", "merged", | |
| "--json", "number,title,author,mergedAt,url,body,labels", | |
| "--search", f"merged:>={since.strftime('%Y-%m-%d')}", | |
| "--limit", "500", | |
| ] |
This lets GitHub pre-filter by merge date and the higher ceiling keeps the client-side pass as a safety net only.
| "gh", "pr", "list", | ||
| "--repo", f"{owner}/{name}", | ||
| "--state", "merged", | ||
| "--json", "number,title,author,mergedAt,url,body,labels", |
There was a problem hiding this comment.
Unused fields fetched from GitHub API
body and labels are included in the --json fields but are never referenced anywhere in the script. Removing them reduces payload size and keeps the API call minimal.
| "--json", "number,title,author,mergedAt,url,body,labels", | |
| "--json", "number,title,author,mergedAt,url", |
| if not pr["author"].get("is_bot", False): | ||
| contributors[author] += 1 | ||
| for author, count in sorted(contributors.items(), key=lambda x: -x[1]): | ||
| lines.append(f"- @{author} ({count} PRs)") | ||
| lines.append("") | ||
|
|
||
| # Bot contributions | ||
| bot_count = sum(1 for _, pr in all_prs if pr["author"].get("is_bot", False)) |
There was a problem hiding this comment.
Bot detection always evaluates to False
gh pr list --json author does not include an is_bot field — the returned object only contains login (and sometimes name). As a result, pr["author"].get("is_bot", False) is always False, every bot PR gets counted in the human contributors list, and bot_count is permanently 0 so the "Automated (Vorflux bot)" line never appears.
The reliable fix is to check whether the login ends with [bot], which is GitHub's naming convention for all Apps and automation accounts (e.g. renovate[bot], dependabot[bot], vorflux[bot]). Replace pr["author"].get("is_bot", False) with pr["author"].get("login", "").endswith("[bot]") in both the contributors filter (line 243) and the bot_count sum (line 250).
Summary
Adds
generate_release_notes.py-- a script that scans merged PRs across all HydraDB repositories for a configurable time window and generates categorized release notes in markdown.Repos scanned
cortex-application, cortex-ingestion, cortex-dashboard, hydradb-on-prem-infra, hydradb-cli, hydradb-mcp, hydradb-claude-code, hydradb-bench, python-sdk, ts-sdk, mintlify-docs, docs, openclaw-hydradb
Features
--dry-runto skip)reports/release-notes-YYYY-MM-DD.mdUsage
Required env vars
GITHUB_TOKEN-- repo read accessOPENAI_API_KEY-- optional, for AI summarizationTesting
python generate_release_notes.py --days 7 --dry-runsuccessfullyreports/release-notes-2026-04-17.mdwith 63 PRs across 6 active repos