feat(report-runner): integrity tracker + threshold abort + IntegrityBadge (GLOOK-13) by msogin · Pull Request #50 · Smartling/glooker

msogin · 2026-06-01T20:39:44Z

Summary

Hardens the report-runner against silent GitHub-API failures (the 5/28 incident). Every SKIP and non-fatal error is now classified + captured in a new reports.run_metadata JSON column. Widespread unknown SKIPs auto-abort the run with a loud failed status instead of shipping a half-empty report.

Classify every SKIP as expected (allowlist), auto-flagged (≥4 of last 5 reports), or unknown — only unknown counts toward thresholds.
Auto-abort at ≥5 AND ≥10% unknown SKIPs; degraded badge at ≥3 OR ≥5%; silent below. The 5/28 incident (41/102 ≈ 40%) would have aborted.
<IntegrityBadge> component renders the failure mode on report pages (silent / amber pill / red banner).
Skip Allowlist Settings tab for admins, seeded with @oshpak (the persistent legitimate SKIP).
withRetry() covers transient 5xx + network errors (3 attempts × 1s/2s/4s). 404 stays non-retryable — that's the deterministic signal we depend on.

Closes GLOOK-13.

Architecture (data flow)

runReport(reportId, org, days)
  │
  ├─► classifySkip = await loadSkipClassifier()    [2 SELECTs: allowlist + last 5 runs]
  ├─► tracker = new IntegrityTracker(expectedCount=members.length)
  │
  ├─► for each member:
  │     ├── try: gather GitHub data (unchanged)
  │     ├── catch L1: tracker.recordSkip(login, msg, classifySkip(login))
  │     ├── catch L2: tracker.recordError({ context: 'openPRs' | 'unmerged-commit-detail', ... })
  │     └── github.isShaInMergedPR(..., onError) → tracker.recordError({ context: 'sha-merge-check', ... })
  │
  ├─► state = evaluateIntegrity(tracker)
  │       state === 'failed'  → mark report failed, persist run_metadata with abortReason, return
  │       state === 'degraded' → ship report + run_metadata; UI shows amber pill
  │       state === 'ok'       → ship report + run_metadata; silent
  └─► (existing completion path)

Thresholds (calibrated against the 5/28 incident)

Scenario	Unknown SKIPs	Org members	Pct	State
5/28 incident	41	102	40.2%	`failed` (≥5 AND ≥10%) ✅ would have aborted
Today's baseline (only `@oshpak`)	0	102	0%	`ok` ✅ silent
4 unknown / 100 members	4	100	4%	`degraded` (count ≥3)
7 unknown / 102 members	7	102	6.9%	`degraded` (count ≥3; abort pct not met)

AND on abort prevents tiny orgs from auto-failing on a single SKIP; OR on degraded keeps the warning sensitive.

Test plan

npm test — 734 / 734 passing across 73 suites. Includes:
- 6 unit tests for IntegrityTracker (dedup, truncation, frozen snapshot)
- 16 unit tests for loadSkipClassifier + evaluateIntegrity (allowlist priority, auto-flag, malformed-row tolerance, AND/OR threshold combinations, 5/28 incident shape, divide-by-zero)
- 7 unit tests for withRetry expansion (5xx retry, network retry, 404 + 401 non-retry, rate-limit preserved)
- 5 unit tests for skip-allowlist API (GET with auto-flagged candidates, POST validation, DELETE)
tsc --noEmit clean
Local smoke (podman, real MySQL): schema migration ran, @oshpak seed present, API surfaces run_metadata field, Settings UI tab renders, fresh report run completed without errors after the LIMIT-inline fix.

One late fix found during smoke

fix(report-runner): inline LIMIT in classifier — mysql2's prepared-statement binary protocol rejects bound LIMIT ? params with "Incorrect arguments to mysqld_stmt_execute". The AUTO_FLAG_RECENT_RUNS = 5 constant is now inlined in the SQL. This was caught and fixed in 8152c6c.

Files touched

Backend

schema.sql, src/lib/db/mysql.ts, src/lib/db/sqlite.ts — reports.run_metadata JSON column + report_skip_allowlist table + @oshpak seed
src/lib/report-runner/types.ts (new) — shared RunMetadata, SkippedMember, IntegrityError, SkipClassification, IntegrityThresholds, DEFAULT_THRESHOLDS
src/lib/report-runner/integrity-tracker.ts (new) — accumulator class with frozen snapshots
src/lib/report-runner/skip-classifier.ts (new) — loadSkipClassifier() + evaluateIntegrity()
src/lib/report-runner.ts — wires tracker + classifier + threshold into the gather loop
src/lib/github.ts — withRetry expansion + isShaInMergedPR onError callback
src/lib/report/service.ts — surfaces run_metadata on getReport

Frontend

src/components/IntegrityBadge.tsx (new) — silent / amber pill / red banner
src/app/report/[id]/team/page.tsx, src/app/report/[id]/org/page.tsx — render the badge; team page hides the developer table when state='failed'
src/app/settings/page.tsx — new Skip Allowlist tab + auto-flagged candidates promote action

API

src/app/api/settings/skip-allowlist/route.ts (new) — GET (entries + candidates) + POST (add/upsert)
src/app/api/settings/skip-allowlist/[login]/route.ts (new) — DELETE

Out of scope (deliberately, per spec)

GitHub-side investigation (Bump next from 15.1.3 to 15.5.10 #1 in the Jira proposal) — ops, not engineering
Splunk alerting (Fix Ollama response parsing and add 3-day period #3 in the Jira proposal) — infra
Mid-run interrupt — threshold evaluated once after the gather loop drains; simpler control flow
401 → immediate fatal — orthogonal hardening; future ticket
Retrying 404s — explicitly NOT done; 404 is the signal that drives the threshold

🤖 Generated with Claude Code

Add an IntegrityTracker + skip classifier + threshold evaluator inside runReport(). Persist a run_metadata JSON column with every SKIP and non-fatal error classified as expected / auto-flagged / unknown. Threshold logic counts only unknown SKIPs: auto-abort at >=5 AND >=10% of org, degraded badge at >=3 OR >=5%. Expand withRetry() to cover transient 5xx + network errors; 404 stays non-retryable so the auth- loss signal that caused 5/28 still surfaces. Out of scope: GitHub-side investigation, Splunk alerting, mid-run interrupt, 401-fatal handling, 404 retries.

@oshpak

Adds the canonical schema + idempotent runtime ALTERs for both MySQL and SQLite. Seeds report_skip_allowlist with @oshpak (the persistent expected SKIP). Broadens the mysql-ready ordering regex to also accept INSERT IGNORE so the new seed call inside the migration IIFE still validates as a gated pre-SELECT statement.

Accumulates SKIPs + non-fatal errors during a report run; dedupes skips by login (last-write-wins); truncates messages to 500 chars.

loadSkipClassifier() runs 2 SELECTs at the top of each report and returns a hot closure (allowlist ⊃ auto-flagged ⊃ unknown). evaluateIntegrity() is pure: counts only unknown SKIPs against the abort (AND) and degraded (OR) thresholds defined in DEFAULT_THRESHOLDS.

Extends withRetry() to cover 5xx HTTP + transient network errors (ECONNRESET, ETIMEDOUT, EAI_AGAIN, ENOTFOUND) with shallow backoff (3 attempts x 1s/2s/4s). 404 stays non-retryable - that's the signal the report-integrity threshold depends on. 403/429 rate-limit behavior is unchanged.

L1/L2/L3 catches now record into IntegrityTracker. After the gather loop, evaluateIntegrity() decides ok/degraded/failed; on failed we write run_metadata + UPDATE reports.status='failed' and short-circuit the rest of the run. ok/degraded paths persist run_metadata alongside the existing 'completed' update.

Renders nothing for state='ok' (silent), an inline amber pill for 'degraded' (click-to-expand detail panel with skipped + errors), or a full red banner for 'failed'.

GET (entries + auto-flagged candidates), POST (add/upsert), DELETE. Admin-only via requireAdmin(). Auto-flagged candidates computed from the same last-5-runs query the report-runner classifier uses.

Lists current allowlist entries, inline add/remove form, and an 'auto-flagged candidates' section listing logins that SKIPped on 4+ of the last 5 reports with a one-click 'Promote' action.

mysql2's prepared-statement binary protocol rejects bound LIMIT params with 'Incorrect arguments to mysqld_stmt_execute', breaking every report run after T6. AUTO_FLAG_RECENT_RUNS is a module constant, not user input — inlining is safe.

msogin

Automated review — PR #50 (GLOOK-13)

Combined findings from a multi-persona review loop (backend + frontend correctness) and the Smartling local PR review. Specific findings are attached as inline comments; the cross-cutting items that don't map to a single line are below. Review-only pass — no code was modified.

🔴 In-scope silent-failure gaps in existing catches (not in this diff, so flagged here)

report-runner.ts:203 — countReviewedPRs catch swallows errors. Sets reviewCounts = 0 on any failure with no recordError; during a GitHub degradation this silently zeroes a score input (review weight) with no trace in run_metadata — the exact failure class this PR targets. Add integrity.recordError({ context: 'other', login: member.login, message }).
report-runner.ts:~334 — branch-compare catch not instrumented. Sibling of the sha-merge-check catch (321-323) that was instrumented in this PR; a compareBranchCommits failure currently drops in-flight commit data with no integrity trace. (The outer unmerged-flow catch at ~360 is a 115-line catch-all — instrumenting it too would be low-signal / risk double-counting, so not recommended.)

🟠 Scope / architecture (worth deciding before merge)

The auto-flagged classification tier appears inert. evaluateIntegrity counts only unknown skips; auto-flagged is excluded exactly like expected (confirmed by skip-classifier.test.ts: 41 auto-flagged / 102 members → ok). So the third tier behaves identically to expected in the only logic that gates the incident, yet it carries a recent-5-reports history scan (duplicated in the GET route) plus a UI surface. Worth confirming it earns its weight in an incident-hardening PR.
The failed-report data gate is the root cause behind the org-page / teams-view / export / partial-rows findings (inline comments below). Integrity gating was applied to one of several data surfaces (team individuals view only). A single page-level "is this report's data presentable?" gate would close the whole class more reliably than per-view guards.
Skip Allowlist CRUD scope. The incident fix needs the allowlist table + seed + classifier read; the POST/DELETE routes + settings tab + promotion UI are a management console that could be a separable follow-up (and the added_by bug only exists because the write path ships in this PR).

Verdict

Not safe to merge as-is. Blocking: the SQLite upsert bug (default DB) and the failed-report-still-renders-data class. The added_by header bug and the two uninstrumented catches undercut the PR's own goal and should be fixed or explicitly accepted. The rest are suggestions / decisions.

msogin · 2026-06-01T21:11:06Z

+  }
+
+  // Extract admin login from auth header if present; null otherwise.
+  const addedBy = req.headers.get('x-amzn-oidc-identity') || null;


🔴 Critical — added_by is always NULL (audit field dead on arrival). This reads x-amzn-oidc-identity, but the app derives identity from the OIDC JWT in x-amzn-oidc-data via extractUser() (auth.ts:14). x-amzn-oidc-identity is never set/consumed elsewhere, so added_by is null in real deployments. Use extractUser(req.headers)?.email ?? null. (Flagged independently by two reviewers.)

msogin · 2026-06-01T21:11:06Z

+  const addedBy = req.headers.get('x-amzn-oidc-identity') || null;
+
+  await db.execute(
+    `INSERT INTO report_skip_allowlist (github_login, reason, added_by) VALUES (?, ?, ?)


🔴 Critical — SQLite upsert breaks on the default DB. ON DUPLICATE KEY UPDATE is translated by the SQLite shim to ON CONFLICT(<cols>), with conflict columns from a hardcoded conflictCols map in translateSQL() (db/sqlite.ts). report_skip_allowlist isn't in that map, so it falls back to ON CONFLICT(id) — but this table's PK is github_login and it has no id column. Re-adding/editing an existing login throws on SQLite (the default DB). Add report_skip_allowlist: 'github_login' to conflictCols.

msogin · 2026-06-01T21:11:06Z

+  if (denied) return denied;
+
+  const { github_login, reason } = await req.json();
+  if (!github_login || typeof github_login !== 'string') {


🟡 github_login is not normalized/validated. Inserted verbatim — leading/trailing whitespace or case (Alice vs alice) creates an entry that won't match the classifier's exact Set.has lookup in skip-classifier.ts, so a "promoted" user silently keeps counting toward the abort threshold. Consider .trim() at minimum and documenting case-sensitivity.

msogin · 2026-06-01T21:11:06Z

+  ) as [any[], any];
+
+  // Compute auto-flagged candidates from the last 5 completed reports.
+  const [recentRows] = await db.execute(


🔵 Duplicated "recent skip counts" logic. This last-5-reports parse-and-count loop (with inlined >= 4 / LIMIT 5) reimplements the auto-flag logic in loadSkipClassifier (skip-classifier.ts) rather than reusing AUTO_FLAG_RECENT_RUNS / AUTO_FLAG_THRESHOLD. The two can drift. Extract a shared helper.

msogin · 2026-06-01T21:11:06Z

+        log?.(`Transient ${label} (attempt ${transientAttempt + 1}/${TRANSIENT_MAX_ATTEMPTS}). Retrying in ${waitMs}ms…`);
+        await sleep(waitMs);
+        transientAttempt++;
+        attempt--; // transient retries don't consume the rate-limit budget


🟡 Retry budget can exceed maxRetries under mixed errors. attempt-- keeps transient 5xx/network retries off the rate-limit budget, but a server alternating 5xx/429 can loop more than maxRetries + TRANSIENT_MAX_ATTEMPTS times. It still terminates (transientAttempt is monotonic), so it's bounded — but the effective max for a mixed sequence is higher than the maxRetries = 5 contract implies, and tests only cover homogeneous error streams. Confirm the intended worst-case bound, or track a single total-attempt counter as a hard cap.

msogin · 2026-06-01T21:11:07Z

+      github_login  TEXT NOT NULL PRIMARY KEY,
+      reason        TEXT NOT NULL,
+      added_by      TEXT,
+      added_at      TEXT NOT NULL DEFAULT (datetime('now','localtime'))


🔵 added_at timezone differs by DB. SQLite default datetime('now','localtime') (local) vs MySQL CURRENT_TIMESTAMP (UTC); the ON DUPLICATE KEY UPDATE ... CURRENT_TIMESTAMP on update also won't be localtime on SQLite. Consistent with this file's existing pattern (not a regression), but the two DBs disagree on this column. Display-only, low impact — noting for consistency.

msogin · 2026-06-01T21:11:07Z

+);
+
+INSERT IGNORE INTO report_skip_allowlist (github_login, reason, added_by) VALUES
+  ('oshpak', 'Private GitHub profile; not in org-visible members for non-mutual permissions', 'seed');


🟣 Question — hardcoded oshpak seed row. A specific real GitHub login (with a "private profile" reason) is seeded into every fresh DB across schema.sql, mysql.ts, and sqlite.ts. Is shipping this individual's username in the open-source schema intended, or should it be environment-specific seed data?

msogin · 2026-06-01T21:11:07Z

+                <td className="px-4 py-2 font-mono">@{e.github_login}</td>
+                <td className="px-4 py-2">{e.reason}</td>
+                <td className="px-4 py-2 text-gray-500">{e.added_by ?? '—'}</td>
+                <td className="px-4 py-2 text-gray-500">{e.added_at}</td>


🔵 added_at rendered raw. {e.added_at} prints the unformatted DB string (and differs by backend) versus the localized dates used elsewhere in the app. Format with new Date(...).toLocaleString(...). Low priority, admin-only screen.

msogin · 2026-06-01T21:11:07Z

+  useEffect(() => { load(); }, []);
+
+  async function add(loginToAdd: string, reasonToAdd: string) {
+    await fetch('/api/settings/skip-allowlist', {


🔵 No feedback on failed allowlist mutations. add / remove ignore the fetch response — a 400/403/500 silently looks like a no-op (the row just reloads unchanged). Surface an inline error/toast on non-ok responses.

msogin · 2026-06-01T21:11:07Z

+                <span className="font-mono text-gray-200">@{c}</span>
+                <button
+                  type="button"
+                  onClick={() => add(c, 'auto-promoted after 4+ consecutive SKIPs')}


🔵 "Promote" reason wording contradicts the rule. The stored reason says auto-promoted after 4+ consecutive SKIPs, but the rule is "≥4 of the last 5" (not necessarily consecutive), as the copy above correctly states. Align the persisted wording.

- auth: read added_by from extractUser(headers).email (x-amzn-oidc-data JWT), not the non-existent x-amzn-oidc-identity header - sqlite: add report_skip_allowlist to translator's conflictCols map so ON DUPLICATE KEY UPDATE upserts correctly on the default DB - normalize github_login (.trim()) before insert; validate reason is non-empty after trim - gate all failed-report data surfaces (org page body, team Teams view, exports CSV/Sheet/PDF) — keep partial rows for forensics but never render them when state='failed' - drop the duplicate red banner — header badge now renders only for non-failed states; failed state surfaces a single banner in body - bound the retry total-attempt count at 12 across all error mixes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…K-13) Two existing catches in report-runner silently dropped data without recording into IntegrityTracker — exactly the failure class this PR targets. Add recordError calls so they surface in run_metadata.errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- IntegrityBadge: drop dead 'developerCount' prop; surface 'N of M' using metadata.expectedCount in the detail panel - IntegrityTracker.snapshot(): drop redundant double cast - evaluateIntegrity accepts a snapshot directly (avoid double-take in the runner) + test adjustments - extract loadRecentSkipCounts() shared helper used by both the classifier and the settings GET endpoint (was duplicated) - Settings UI: format added_at via toLocaleString, surface mutation errors inline, align promote-reason wording with the actual rule Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

msogin added 14 commits June 1, 2026 11:57

docs(glook-13): implementation plan for report-integrity hardening

8a6952d

feat(report-runner): integrity types (GLOOK-13)

e3e7b21

feat(report-runner): IntegrityTracker (GLOOK-13)

d3013b3

Accumulates SKIPs + non-fatal errors during a report run; dedupes skips by login (last-write-wins); truncates messages to 500 chars.

feat(report): surface run_metadata on getReport (GLOOK-13)

4069641

feat(ui): IntegrityBadge component (GLOOK-13)

700e39e

Renders nothing for state='ok' (silent), an inline amber pill for 'degraded' (click-to-expand detail panel with skipped + errors), or a full red banner for 'failed'.

feat(report-page): wire IntegrityBadge into team + org pages (GLOOK-13)

44ead7a

feat(api): skip-allowlist CRUD (GLOOK-13)

53ea878

GET (entries + auto-flagged candidates), POST (add/upsert), DELETE. Admin-only via requireAdmin(). Auto-flagged candidates computed from the same last-5-runs query the report-runner classifier uses.

feat(settings): Skip Allowlist tab (GLOOK-13)

ce80466

Lists current allowlist entries, inline add/remove form, and an 'auto-flagged candidates' section listing logins that SKIPped on 4+ of the last 5 reports with a one-click 'Promote' action.

msogin commented Jun 1, 2026

View reviewed changes

msogin and others added 3 commits June 1, 2026 17:17

msogin merged commit 14a3e75 into main Jun 1, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(report-runner): integrity tracker + threshold abort + IntegrityBadge (GLOOK-13)#50

feat(report-runner): integrity tracker + threshold abort + IntegrityBadge (GLOOK-13)#50
msogin merged 17 commits into
mainfrom
spec/glook-13-report-integrity

msogin commented Jun 1, 2026

Uh oh!

msogin left a comment

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

msogin Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

msogin commented Jun 1, 2026

Summary

Architecture (data flow)

Thresholds (calibrated against the 5/28 incident)

Test plan

One late fix found during smoke

Files touched

Out of scope (deliberately, per spec)

Uh oh!

msogin left a comment

Choose a reason for hiding this comment

Automated review — PR #50 (GLOOK-13)

🔴 In-scope silent-failure gaps in existing catches (not in this diff, so flagged here)

🟠 Scope / architecture (worth deciding before merge)

Verdict

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant