feat(report-runner): integrity tracker + threshold abort + IntegrityBadge (GLOOK-13)#50
Conversation
Add an IntegrityTracker + skip classifier + threshold evaluator inside runReport(). Persist a run_metadata JSON column with every SKIP and non-fatal error classified as expected / auto-flagged / unknown. Threshold logic counts only unknown SKIPs: auto-abort at >=5 AND >=10% of org, degraded badge at >=3 OR >=5%. Expand withRetry() to cover transient 5xx + network errors; 404 stays non-retryable so the auth- loss signal that caused 5/28 still surfaces. Out of scope: GitHub-side investigation, Splunk alerting, mid-run interrupt, 401-fatal handling, 404 retries.
Adds the canonical schema + idempotent runtime ALTERs for both MySQL and SQLite. Seeds report_skip_allowlist with @oshpak (the persistent expected SKIP). Broadens the mysql-ready ordering regex to also accept INSERT IGNORE so the new seed call inside the migration IIFE still validates as a gated pre-SELECT statement.
Accumulates SKIPs + non-fatal errors during a report run; dedupes skips by login (last-write-wins); truncates messages to 500 chars.
loadSkipClassifier() runs 2 SELECTs at the top of each report and returns a hot closure (allowlist ⊃ auto-flagged ⊃ unknown). evaluateIntegrity() is pure: counts only unknown SKIPs against the abort (AND) and degraded (OR) thresholds defined in DEFAULT_THRESHOLDS.
Extends withRetry() to cover 5xx HTTP + transient network errors (ECONNRESET, ETIMEDOUT, EAI_AGAIN, ENOTFOUND) with shallow backoff (3 attempts x 1s/2s/4s). 404 stays non-retryable - that's the signal the report-integrity threshold depends on. 403/429 rate-limit behavior is unchanged.
L1/L2/L3 catches now record into IntegrityTracker. After the gather loop, evaluateIntegrity() decides ok/degraded/failed; on failed we write run_metadata + UPDATE reports.status='failed' and short-circuit the rest of the run. ok/degraded paths persist run_metadata alongside the existing 'completed' update.
Renders nothing for state='ok' (silent), an inline amber pill for 'degraded' (click-to-expand detail panel with skipped + errors), or a full red banner for 'failed'.
GET (entries + auto-flagged candidates), POST (add/upsert), DELETE. Admin-only via requireAdmin(). Auto-flagged candidates computed from the same last-5-runs query the report-runner classifier uses.
Lists current allowlist entries, inline add/remove form, and an 'auto-flagged candidates' section listing logins that SKIPped on 4+ of the last 5 reports with a one-click 'Promote' action.
mysql2's prepared-statement binary protocol rejects bound LIMIT params with 'Incorrect arguments to mysqld_stmt_execute', breaking every report run after T6. AUTO_FLAG_RECENT_RUNS is a module constant, not user input — inlining is safe.
msogin
left a comment
There was a problem hiding this comment.
Automated review — PR #50 (GLOOK-13)
Combined findings from a multi-persona review loop (backend + frontend correctness) and the Smartling local PR review. Specific findings are attached as inline comments; the cross-cutting items that don't map to a single line are below. Review-only pass — no code was modified.
🔴 In-scope silent-failure gaps in existing catches (not in this diff, so flagged here)
report-runner.ts:203—countReviewedPRscatch swallows errors. SetsreviewCounts = 0on any failure with norecordError; during a GitHub degradation this silently zeroes a score input (review weight) with no trace inrun_metadata— the exact failure class this PR targets. Addintegrity.recordError({ context: 'other', login: member.login, message }).report-runner.ts:~334— branch-compare catch not instrumented. Sibling of thesha-merge-checkcatch (321-323) that was instrumented in this PR; acompareBranchCommitsfailure currently drops in-flight commit data with no integrity trace. (The outer unmerged-flow catch at ~360 is a 115-line catch-all — instrumenting it too would be low-signal / risk double-counting, so not recommended.)
🟠 Scope / architecture (worth deciding before merge)
- The
auto-flaggedclassification tier appears inert.evaluateIntegritycounts onlyunknownskips;auto-flaggedis excluded exactly likeexpected(confirmed byskip-classifier.test.ts: 41 auto-flagged / 102 members →ok). So the third tier behaves identically toexpectedin the only logic that gates the incident, yet it carries a recent-5-reports history scan (duplicated in the GET route) plus a UI surface. Worth confirming it earns its weight in an incident-hardening PR. - The failed-report data gate is the root cause behind the org-page / teams-view / export / partial-rows findings (inline comments below). Integrity gating was applied to one of several data surfaces (team individuals view only). A single page-level "is this report's data presentable?" gate would close the whole class more reliably than per-view guards.
- Skip Allowlist CRUD scope. The incident fix needs the allowlist table + seed + classifier read; the POST/DELETE routes + settings tab + promotion UI are a management console that could be a separable follow-up (and the
added_bybug only exists because the write path ships in this PR).
Verdict
Not safe to merge as-is. Blocking: the SQLite upsert bug (default DB) and the failed-report-still-renders-data class. The added_by header bug and the two uninstrumented catches undercut the PR's own goal and should be fixed or explicitly accepted. The rest are suggestions / decisions.
| } | ||
|
|
||
| // Extract admin login from auth header if present; null otherwise. | ||
| const addedBy = req.headers.get('x-amzn-oidc-identity') || null; |
There was a problem hiding this comment.
🔴 Critical — added_by is always NULL (audit field dead on arrival). This reads x-amzn-oidc-identity, but the app derives identity from the OIDC JWT in x-amzn-oidc-data via extractUser() (auth.ts:14). x-amzn-oidc-identity is never set/consumed elsewhere, so added_by is null in real deployments. Use extractUser(req.headers)?.email ?? null. (Flagged independently by two reviewers.)
| const addedBy = req.headers.get('x-amzn-oidc-identity') || null; | ||
|
|
||
| await db.execute( | ||
| `INSERT INTO report_skip_allowlist (github_login, reason, added_by) VALUES (?, ?, ?) |
There was a problem hiding this comment.
🔴 Critical — SQLite upsert breaks on the default DB. ON DUPLICATE KEY UPDATE is translated by the SQLite shim to ON CONFLICT(<cols>), with conflict columns from a hardcoded conflictCols map in translateSQL() (db/sqlite.ts). report_skip_allowlist isn't in that map, so it falls back to ON CONFLICT(id) — but this table's PK is github_login and it has no id column. Re-adding/editing an existing login throws on SQLite (the default DB). Add report_skip_allowlist: 'github_login' to conflictCols.
| if (denied) return denied; | ||
|
|
||
| const { github_login, reason } = await req.json(); | ||
| if (!github_login || typeof github_login !== 'string') { |
There was a problem hiding this comment.
🟡 github_login is not normalized/validated. Inserted verbatim — leading/trailing whitespace or case (Alice vs alice) creates an entry that won't match the classifier's exact Set.has lookup in skip-classifier.ts, so a "promoted" user silently keeps counting toward the abort threshold. Consider .trim() at minimum and documenting case-sensitivity.
| ) as [any[], any]; | ||
|
|
||
| // Compute auto-flagged candidates from the last 5 completed reports. | ||
| const [recentRows] = await db.execute( |
There was a problem hiding this comment.
🔵 Duplicated "recent skip counts" logic. This last-5-reports parse-and-count loop (with inlined >= 4 / LIMIT 5) reimplements the auto-flag logic in loadSkipClassifier (skip-classifier.ts) rather than reusing AUTO_FLAG_RECENT_RUNS / AUTO_FLAG_THRESHOLD. The two can drift. Extract a shared helper.
| log?.(`Transient ${label} (attempt ${transientAttempt + 1}/${TRANSIENT_MAX_ATTEMPTS}). Retrying in ${waitMs}ms…`); | ||
| await sleep(waitMs); | ||
| transientAttempt++; | ||
| attempt--; // transient retries don't consume the rate-limit budget |
There was a problem hiding this comment.
🟡 Retry budget can exceed maxRetries under mixed errors. attempt-- keeps transient 5xx/network retries off the rate-limit budget, but a server alternating 5xx/429 can loop more than maxRetries + TRANSIENT_MAX_ATTEMPTS times. It still terminates (transientAttempt is monotonic), so it's bounded — but the effective max for a mixed sequence is higher than the maxRetries = 5 contract implies, and tests only cover homogeneous error streams. Confirm the intended worst-case bound, or track a single total-attempt counter as a hard cap.
| github_login TEXT NOT NULL PRIMARY KEY, | ||
| reason TEXT NOT NULL, | ||
| added_by TEXT, | ||
| added_at TEXT NOT NULL DEFAULT (datetime('now','localtime')) |
There was a problem hiding this comment.
🔵 added_at timezone differs by DB. SQLite default datetime('now','localtime') (local) vs MySQL CURRENT_TIMESTAMP (UTC); the ON DUPLICATE KEY UPDATE ... CURRENT_TIMESTAMP on update also won't be localtime on SQLite. Consistent with this file's existing pattern (not a regression), but the two DBs disagree on this column. Display-only, low impact — noting for consistency.
| ); | ||
|
|
||
| INSERT IGNORE INTO report_skip_allowlist (github_login, reason, added_by) VALUES | ||
| ('oshpak', 'Private GitHub profile; not in org-visible members for non-mutual permissions', 'seed'); |
There was a problem hiding this comment.
🟣 Question — hardcoded oshpak seed row. A specific real GitHub login (with a "private profile" reason) is seeded into every fresh DB across schema.sql, mysql.ts, and sqlite.ts. Is shipping this individual's username in the open-source schema intended, or should it be environment-specific seed data?
| <td className="px-4 py-2 font-mono">@{e.github_login}</td> | ||
| <td className="px-4 py-2">{e.reason}</td> | ||
| <td className="px-4 py-2 text-gray-500">{e.added_by ?? '—'}</td> | ||
| <td className="px-4 py-2 text-gray-500">{e.added_at}</td> |
There was a problem hiding this comment.
🔵 added_at rendered raw. {e.added_at} prints the unformatted DB string (and differs by backend) versus the localized dates used elsewhere in the app. Format with new Date(...).toLocaleString(...). Low priority, admin-only screen.
| useEffect(() => { load(); }, []); | ||
|
|
||
| async function add(loginToAdd: string, reasonToAdd: string) { | ||
| await fetch('/api/settings/skip-allowlist', { |
There was a problem hiding this comment.
🔵 No feedback on failed allowlist mutations. add / remove ignore the fetch response — a 400/403/500 silently looks like a no-op (the row just reloads unchanged). Surface an inline error/toast on non-ok responses.
| <span className="font-mono text-gray-200">@{c}</span> | ||
| <button | ||
| type="button" | ||
| onClick={() => add(c, 'auto-promoted after 4+ consecutive SKIPs')} |
There was a problem hiding this comment.
🔵 "Promote" reason wording contradicts the rule. The stored reason says auto-promoted after 4+ consecutive SKIPs, but the rule is "≥4 of the last 5" (not necessarily consecutive), as the copy above correctly states. Align the persisted wording.
- auth: read added_by from extractUser(headers).email (x-amzn-oidc-data JWT), not the non-existent x-amzn-oidc-identity header - sqlite: add report_skip_allowlist to translator's conflictCols map so ON DUPLICATE KEY UPDATE upserts correctly on the default DB - normalize github_login (.trim()) before insert; validate reason is non-empty after trim - gate all failed-report data surfaces (org page body, team Teams view, exports CSV/Sheet/PDF) — keep partial rows for forensics but never render them when state='failed' - drop the duplicate red banner — header badge now renders only for non-failed states; failed state surfaces a single banner in body - bound the retry total-attempt count at 12 across all error mixes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…K-13) Two existing catches in report-runner silently dropped data without recording into IntegrityTracker — exactly the failure class this PR targets. Add recordError calls so they surface in run_metadata.errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- IntegrityBadge: drop dead 'developerCount' prop; surface 'N of M' using metadata.expectedCount in the detail panel - IntegrityTracker.snapshot(): drop redundant double cast - evaluateIntegrity accepts a snapshot directly (avoid double-take in the runner) + test adjustments - extract loadRecentSkipCounts() shared helper used by both the classifier and the settings GET endpoint (was duplicated) - Settings UI: format added_at via toLocaleString, surface mutation errors inline, align promote-reason wording with the actual rule Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Hardens the report-runner against silent GitHub-API failures (the 5/28 incident). Every SKIP and non-fatal error is now classified + captured in a new
reports.run_metadataJSON column. WidespreadunknownSKIPs auto-abort the run with a loudfailedstatus instead of shipping a half-empty report.expected(allowlist),auto-flagged(≥4 of last 5 reports), orunknown— onlyunknowncounts toward thresholds.<IntegrityBadge>component renders the failure mode on report pages (silent / amber pill / red banner).@oshpak(the persistent legitimate SKIP).withRetry()covers transient 5xx + network errors (3 attempts × 1s/2s/4s). 404 stays non-retryable — that's the deterministic signal we depend on.Closes GLOOK-13.
Architecture (data flow)
Thresholds (calibrated against the 5/28 incident)
failed(≥5 AND ≥10%) ✅ would have aborted@oshpak)ok✅ silentdegraded(count ≥3)degraded(count ≥3; abort pct not met)ANDon abort prevents tiny orgs from auto-failing on a single SKIP;ORon degraded keeps the warning sensitive.Test plan
npm test— 734 / 734 passing across 73 suites. Includes:IntegrityTracker(dedup, truncation, frozen snapshot)loadSkipClassifier+evaluateIntegrity(allowlist priority, auto-flag, malformed-row tolerance, AND/OR threshold combinations, 5/28 incident shape, divide-by-zero)withRetryexpansion (5xx retry, network retry, 404 + 401 non-retry, rate-limit preserved)tsc --noEmitclean@oshpakseed present, API surfacesrun_metadatafield, Settings UI tab renders, fresh report run completed without errors after the LIMIT-inline fix.One late fix found during smoke
fix(report-runner): inline LIMIT in classifier— mysql2's prepared-statement binary protocol rejects boundLIMIT ?params with "Incorrect arguments to mysqld_stmt_execute". TheAUTO_FLAG_RECENT_RUNS = 5constant is now inlined in the SQL. This was caught and fixed in8152c6c.Files touched
Backend
schema.sql,src/lib/db/mysql.ts,src/lib/db/sqlite.ts—reports.run_metadataJSON column +report_skip_allowlisttable +@oshpakseedsrc/lib/report-runner/types.ts(new) — sharedRunMetadata,SkippedMember,IntegrityError,SkipClassification,IntegrityThresholds,DEFAULT_THRESHOLDSsrc/lib/report-runner/integrity-tracker.ts(new) — accumulator class with frozen snapshotssrc/lib/report-runner/skip-classifier.ts(new) —loadSkipClassifier()+evaluateIntegrity()src/lib/report-runner.ts— wires tracker + classifier + threshold into the gather loopsrc/lib/github.ts—withRetryexpansion +isShaInMergedPRonErrorcallbacksrc/lib/report/service.ts— surfacesrun_metadataongetReportFrontend
src/components/IntegrityBadge.tsx(new) — silent / amber pill / red bannersrc/app/report/[id]/team/page.tsx,src/app/report/[id]/org/page.tsx— render the badge; team page hides the developer table whenstate='failed'src/app/settings/page.tsx— new Skip Allowlist tab + auto-flagged candidates promote actionAPI
src/app/api/settings/skip-allowlist/route.ts(new) — GET (entries + candidates) + POST (add/upsert)src/app/api/settings/skip-allowlist/[login]/route.ts(new) — DELETEOut of scope (deliberately, per spec)
🤖 Generated with Claude Code