Skip to content

feat(report-runner): integrity tracker + threshold abort + IntegrityBadge (GLOOK-13)#50

Merged
msogin merged 17 commits into
mainfrom
spec/glook-13-report-integrity
Jun 1, 2026
Merged

feat(report-runner): integrity tracker + threshold abort + IntegrityBadge (GLOOK-13)#50
msogin merged 17 commits into
mainfrom
spec/glook-13-report-integrity

Conversation

@msogin

@msogin msogin commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Hardens the report-runner against silent GitHub-API failures (the 5/28 incident). Every SKIP and non-fatal error is now classified + captured in a new reports.run_metadata JSON column. Widespread unknown SKIPs auto-abort the run with a loud failed status instead of shipping a half-empty report.

  • Classify every SKIP as expected (allowlist), auto-flagged (≥4 of last 5 reports), or unknown — only unknown counts toward thresholds.
  • Auto-abort at ≥5 AND ≥10% unknown SKIPs; degraded badge at ≥3 OR ≥5%; silent below. The 5/28 incident (41/102 ≈ 40%) would have aborted.
  • <IntegrityBadge> component renders the failure mode on report pages (silent / amber pill / red banner).
  • Skip Allowlist Settings tab for admins, seeded with @oshpak (the persistent legitimate SKIP).
  • withRetry() covers transient 5xx + network errors (3 attempts × 1s/2s/4s). 404 stays non-retryable — that's the deterministic signal we depend on.

Closes GLOOK-13.

Architecture (data flow)

runReport(reportId, org, days)
  │
  ├─► classifySkip = await loadSkipClassifier()    [2 SELECTs: allowlist + last 5 runs]
  ├─► tracker = new IntegrityTracker(expectedCount=members.length)
  │
  ├─► for each member:
  │     ├── try: gather GitHub data (unchanged)
  │     ├── catch L1: tracker.recordSkip(login, msg, classifySkip(login))
  │     ├── catch L2: tracker.recordError({ context: 'openPRs' | 'unmerged-commit-detail', ... })
  │     └── github.isShaInMergedPR(..., onError) → tracker.recordError({ context: 'sha-merge-check', ... })
  │
  ├─► state = evaluateIntegrity(tracker)
  │       state === 'failed'  → mark report failed, persist run_metadata with abortReason, return
  │       state === 'degraded' → ship report + run_metadata; UI shows amber pill
  │       state === 'ok'       → ship report + run_metadata; silent
  └─► (existing completion path)

Thresholds (calibrated against the 5/28 incident)

Scenario Unknown SKIPs Org members Pct State
5/28 incident 41 102 40.2% failed (≥5 AND ≥10%) ✅ would have aborted
Today's baseline (only @oshpak) 0 102 0% ok ✅ silent
4 unknown / 100 members 4 100 4% degraded (count ≥3)
7 unknown / 102 members 7 102 6.9% degraded (count ≥3; abort pct not met)

AND on abort prevents tiny orgs from auto-failing on a single SKIP; OR on degraded keeps the warning sensitive.

Test plan

  • npm test — 734 / 734 passing across 73 suites. Includes:
    • 6 unit tests for IntegrityTracker (dedup, truncation, frozen snapshot)
    • 16 unit tests for loadSkipClassifier + evaluateIntegrity (allowlist priority, auto-flag, malformed-row tolerance, AND/OR threshold combinations, 5/28 incident shape, divide-by-zero)
    • 7 unit tests for withRetry expansion (5xx retry, network retry, 404 + 401 non-retry, rate-limit preserved)
    • 5 unit tests for skip-allowlist API (GET with auto-flagged candidates, POST validation, DELETE)
  • tsc --noEmit clean
  • Local smoke (podman, real MySQL): schema migration ran, @oshpak seed present, API surfaces run_metadata field, Settings UI tab renders, fresh report run completed without errors after the LIMIT-inline fix.

One late fix found during smoke

fix(report-runner): inline LIMIT in classifier — mysql2's prepared-statement binary protocol rejects bound LIMIT ? params with "Incorrect arguments to mysqld_stmt_execute". The AUTO_FLAG_RECENT_RUNS = 5 constant is now inlined in the SQL. This was caught and fixed in 8152c6c.

Files touched

Backend

  • schema.sql, src/lib/db/mysql.ts, src/lib/db/sqlite.tsreports.run_metadata JSON column + report_skip_allowlist table + @oshpak seed
  • src/lib/report-runner/types.ts (new) — shared RunMetadata, SkippedMember, IntegrityError, SkipClassification, IntegrityThresholds, DEFAULT_THRESHOLDS
  • src/lib/report-runner/integrity-tracker.ts (new) — accumulator class with frozen snapshots
  • src/lib/report-runner/skip-classifier.ts (new)loadSkipClassifier() + evaluateIntegrity()
  • src/lib/report-runner.ts — wires tracker + classifier + threshold into the gather loop
  • src/lib/github.tswithRetry expansion + isShaInMergedPR onError callback
  • src/lib/report/service.ts — surfaces run_metadata on getReport

Frontend

  • src/components/IntegrityBadge.tsx (new) — silent / amber pill / red banner
  • src/app/report/[id]/team/page.tsx, src/app/report/[id]/org/page.tsx — render the badge; team page hides the developer table when state='failed'
  • src/app/settings/page.tsx — new Skip Allowlist tab + auto-flagged candidates promote action

API

  • src/app/api/settings/skip-allowlist/route.ts (new) — GET (entries + candidates) + POST (add/upsert)
  • src/app/api/settings/skip-allowlist/[login]/route.ts (new) — DELETE

Out of scope (deliberately, per spec)

  • GitHub-side investigation (Bump next from 15.1.3 to 15.5.10 #1 in the Jira proposal) — ops, not engineering
  • Splunk alerting (Fix Ollama response parsing and add 3-day period #3 in the Jira proposal) — infra
  • Mid-run interrupt — threshold evaluated once after the gather loop drains; simpler control flow
  • 401 → immediate fatal — orthogonal hardening; future ticket
  • Retrying 404s — explicitly NOT done; 404 is the signal that drives the threshold

🤖 Generated with Claude Code

msogin added 14 commits June 1, 2026 11:57
Add an IntegrityTracker + skip classifier + threshold evaluator inside
runReport(). Persist a run_metadata JSON column with every SKIP and
non-fatal error classified as expected / auto-flagged / unknown.
Threshold logic counts only unknown SKIPs: auto-abort at >=5 AND >=10%
of org, degraded badge at >=3 OR >=5%. Expand withRetry() to cover
transient 5xx + network errors; 404 stays non-retryable so the auth-
loss signal that caused 5/28 still surfaces.

Out of scope: GitHub-side investigation, Splunk alerting, mid-run
interrupt, 401-fatal handling, 404 retries.
Adds the canonical schema + idempotent runtime ALTERs for both
MySQL and SQLite. Seeds report_skip_allowlist with @oshpak (the
persistent expected SKIP).

Broadens the mysql-ready ordering regex to also accept INSERT IGNORE
so the new seed call inside the migration IIFE still validates as a
gated pre-SELECT statement.
Accumulates SKIPs + non-fatal errors during a report run; dedupes
skips by login (last-write-wins); truncates messages to 500 chars.
loadSkipClassifier() runs 2 SELECTs at the top of each report and
returns a hot closure (allowlist ⊃ auto-flagged ⊃ unknown).
evaluateIntegrity() is pure: counts only unknown SKIPs against the
abort (AND) and degraded (OR) thresholds defined in DEFAULT_THRESHOLDS.
Extends withRetry() to cover 5xx HTTP + transient network errors
(ECONNRESET, ETIMEDOUT, EAI_AGAIN, ENOTFOUND) with shallow backoff
(3 attempts x 1s/2s/4s). 404 stays non-retryable - that's the signal
the report-integrity threshold depends on. 403/429 rate-limit
behavior is unchanged.
L1/L2/L3 catches now record into IntegrityTracker. After the gather
loop, evaluateIntegrity() decides ok/degraded/failed; on failed we
write run_metadata + UPDATE reports.status='failed' and short-circuit
the rest of the run. ok/degraded paths persist run_metadata alongside
the existing 'completed' update.
Renders nothing for state='ok' (silent), an inline amber pill for
'degraded' (click-to-expand detail panel with skipped + errors),
or a full red banner for 'failed'.
GET (entries + auto-flagged candidates), POST (add/upsert), DELETE.
Admin-only via requireAdmin(). Auto-flagged candidates computed from
the same last-5-runs query the report-runner classifier uses.
Lists current allowlist entries, inline add/remove form, and an
'auto-flagged candidates' section listing logins that SKIPped on
4+ of the last 5 reports with a one-click 'Promote' action.
mysql2's prepared-statement binary protocol rejects bound LIMIT params
with 'Incorrect arguments to mysqld_stmt_execute', breaking every
report run after T6. AUTO_FLAG_RECENT_RUNS is a module constant, not
user input — inlining is safe.

@msogin msogin left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review — PR #50 (GLOOK-13)

Combined findings from a multi-persona review loop (backend + frontend correctness) and the Smartling local PR review. Specific findings are attached as inline comments; the cross-cutting items that don't map to a single line are below. Review-only pass — no code was modified.

🔴 In-scope silent-failure gaps in existing catches (not in this diff, so flagged here)

  • report-runner.ts:203countReviewedPRs catch swallows errors. Sets reviewCounts = 0 on any failure with no recordError; during a GitHub degradation this silently zeroes a score input (review weight) with no trace in run_metadata — the exact failure class this PR targets. Add integrity.recordError({ context: 'other', login: member.login, message }).
  • report-runner.ts:~334 — branch-compare catch not instrumented. Sibling of the sha-merge-check catch (321-323) that was instrumented in this PR; a compareBranchCommits failure currently drops in-flight commit data with no integrity trace. (The outer unmerged-flow catch at ~360 is a 115-line catch-all — instrumenting it too would be low-signal / risk double-counting, so not recommended.)

🟠 Scope / architecture (worth deciding before merge)

  • The auto-flagged classification tier appears inert. evaluateIntegrity counts only unknown skips; auto-flagged is excluded exactly like expected (confirmed by skip-classifier.test.ts: 41 auto-flagged / 102 members → ok). So the third tier behaves identically to expected in the only logic that gates the incident, yet it carries a recent-5-reports history scan (duplicated in the GET route) plus a UI surface. Worth confirming it earns its weight in an incident-hardening PR.
  • The failed-report data gate is the root cause behind the org-page / teams-view / export / partial-rows findings (inline comments below). Integrity gating was applied to one of several data surfaces (team individuals view only). A single page-level "is this report's data presentable?" gate would close the whole class more reliably than per-view guards.
  • Skip Allowlist CRUD scope. The incident fix needs the allowlist table + seed + classifier read; the POST/DELETE routes + settings tab + promotion UI are a management console that could be a separable follow-up (and the added_by bug only exists because the write path ships in this PR).

Verdict

Not safe to merge as-is. Blocking: the SQLite upsert bug (default DB) and the failed-report-still-renders-data class. The added_by header bug and the two uninstrumented catches undercut the PR's own goal and should be fixed or explicitly accepted. The rest are suggestions / decisions.

}

// Extract admin login from auth header if present; null otherwise.
const addedBy = req.headers.get('x-amzn-oidc-identity') || null;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical — added_by is always NULL (audit field dead on arrival). This reads x-amzn-oidc-identity, but the app derives identity from the OIDC JWT in x-amzn-oidc-data via extractUser() (auth.ts:14). x-amzn-oidc-identity is never set/consumed elsewhere, so added_by is null in real deployments. Use extractUser(req.headers)?.email ?? null. (Flagged independently by two reviewers.)

const addedBy = req.headers.get('x-amzn-oidc-identity') || null;

await db.execute(
`INSERT INTO report_skip_allowlist (github_login, reason, added_by) VALUES (?, ?, ?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical — SQLite upsert breaks on the default DB. ON DUPLICATE KEY UPDATE is translated by the SQLite shim to ON CONFLICT(<cols>), with conflict columns from a hardcoded conflictCols map in translateSQL() (db/sqlite.ts). report_skip_allowlist isn't in that map, so it falls back to ON CONFLICT(id) — but this table's PK is github_login and it has no id column. Re-adding/editing an existing login throws on SQLite (the default DB). Add report_skip_allowlist: 'github_login' to conflictCols.

if (denied) return denied;

const { github_login, reason } = await req.json();
if (!github_login || typeof github_login !== 'string') {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 github_login is not normalized/validated. Inserted verbatim — leading/trailing whitespace or case (Alice vs alice) creates an entry that won't match the classifier's exact Set.has lookup in skip-classifier.ts, so a "promoted" user silently keeps counting toward the abort threshold. Consider .trim() at minimum and documenting case-sensitivity.

) as [any[], any];

// Compute auto-flagged candidates from the last 5 completed reports.
const [recentRows] = await db.execute(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Duplicated "recent skip counts" logic. This last-5-reports parse-and-count loop (with inlined >= 4 / LIMIT 5) reimplements the auto-flag logic in loadSkipClassifier (skip-classifier.ts) rather than reusing AUTO_FLAG_RECENT_RUNS / AUTO_FLAG_THRESHOLD. The two can drift. Extract a shared helper.

Comment thread src/lib/github.ts
log?.(`Transient ${label} (attempt ${transientAttempt + 1}/${TRANSIENT_MAX_ATTEMPTS}). Retrying in ${waitMs}ms…`);
await sleep(waitMs);
transientAttempt++;
attempt--; // transient retries don't consume the rate-limit budget

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Retry budget can exceed maxRetries under mixed errors. attempt-- keeps transient 5xx/network retries off the rate-limit budget, but a server alternating 5xx/429 can loop more than maxRetries + TRANSIENT_MAX_ATTEMPTS times. It still terminates (transientAttempt is monotonic), so it's bounded — but the effective max for a mixed sequence is higher than the maxRetries = 5 contract implies, and tests only cover homogeneous error streams. Confirm the intended worst-case bound, or track a single total-attempt counter as a hard cap.

Comment thread src/lib/db/sqlite.ts
github_login TEXT NOT NULL PRIMARY KEY,
reason TEXT NOT NULL,
added_by TEXT,
added_at TEXT NOT NULL DEFAULT (datetime('now','localtime'))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 added_at timezone differs by DB. SQLite default datetime('now','localtime') (local) vs MySQL CURRENT_TIMESTAMP (UTC); the ON DUPLICATE KEY UPDATE ... CURRENT_TIMESTAMP on update also won't be localtime on SQLite. Consistent with this file's existing pattern (not a regression), but the two DBs disagree on this column. Display-only, low impact — noting for consistency.

Comment thread schema.sql
);

INSERT IGNORE INTO report_skip_allowlist (github_login, reason, added_by) VALUES
('oshpak', 'Private GitHub profile; not in org-visible members for non-mutual permissions', 'seed');

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 Question — hardcoded oshpak seed row. A specific real GitHub login (with a "private profile" reason) is seeded into every fresh DB across schema.sql, mysql.ts, and sqlite.ts. Is shipping this individual's username in the open-source schema intended, or should it be environment-specific seed data?

Comment thread src/app/settings/page.tsx Outdated
<td className="px-4 py-2 font-mono">@{e.github_login}</td>
<td className="px-4 py-2">{e.reason}</td>
<td className="px-4 py-2 text-gray-500">{e.added_by ?? '—'}</td>
<td className="px-4 py-2 text-gray-500">{e.added_at}</td>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 added_at rendered raw. {e.added_at} prints the unformatted DB string (and differs by backend) versus the localized dates used elsewhere in the app. Format with new Date(...).toLocaleString(...). Low priority, admin-only screen.

Comment thread src/app/settings/page.tsx Outdated
useEffect(() => { load(); }, []);

async function add(loginToAdd: string, reasonToAdd: string) {
await fetch('/api/settings/skip-allowlist', {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 No feedback on failed allowlist mutations. add / remove ignore the fetch response — a 400/403/500 silently looks like a no-op (the row just reloads unchanged). Surface an inline error/toast on non-ok responses.

Comment thread src/app/settings/page.tsx Outdated
<span className="font-mono text-gray-200">@{c}</span>
<button
type="button"
onClick={() => add(c, 'auto-promoted after 4+ consecutive SKIPs')}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 "Promote" reason wording contradicts the rule. The stored reason says auto-promoted after 4+ consecutive SKIPs, but the rule is "≥4 of the last 5" (not necessarily consecutive), as the copy above correctly states. Align the persisted wording.

msogin and others added 3 commits June 1, 2026 17:17
- auth: read added_by from extractUser(headers).email (x-amzn-oidc-data
  JWT), not the non-existent x-amzn-oidc-identity header
- sqlite: add report_skip_allowlist to translator's conflictCols map
  so ON DUPLICATE KEY UPDATE upserts correctly on the default DB
- normalize github_login (.trim()) before insert; validate reason
  is non-empty after trim
- gate all failed-report data surfaces (org page body, team Teams
  view, exports CSV/Sheet/PDF) — keep partial rows for forensics
  but never render them when state='failed'
- drop the duplicate red banner — header badge now renders only for
  non-failed states; failed state surfaces a single banner in body
- bound the retry total-attempt count at 12 across all error mixes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…K-13)

Two existing catches in report-runner silently dropped data without
recording into IntegrityTracker — exactly the failure class this PR
targets. Add recordError calls so they surface in run_metadata.errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- IntegrityBadge: drop dead 'developerCount' prop; surface
  'N of M' using metadata.expectedCount in the detail panel
- IntegrityTracker.snapshot(): drop redundant double cast
- evaluateIntegrity accepts a snapshot directly (avoid double-take
  in the runner) + test adjustments
- extract loadRecentSkipCounts() shared helper used by both the
  classifier and the settings GET endpoint (was duplicated)
- Settings UI: format added_at via toLocaleString, surface mutation
  errors inline, align promote-reason wording with the actual rule

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@msogin msogin merged commit 14a3e75 into main Jun 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant