feat(research/systemic_risk): governance gates + scope-explicit aliases + temporal-panel boundary#564
Conversation
…es + temporal-panel boundary Closes the post-merge canonical R&D review on PR #562: PR #562 TITLE RENAMED (post-merge edit via gh API) Old: "production-grade v2 ..." New: "R&D hypothesis instrument v2 — directed coupling, MLE BA fit, bootstrap-CI falsification" Rationale: per § 2 of the canonical checklist, "production-grade" is forbidden language until a real-data run + replication exist. GOVERNANCE GATES (new module governance.py) - assert_claim_tier(claimed, evidence) — refuses promotion - build_validation_readiness_report(...) — explicit per-axis flags - run_premerge_science_gate(docs_root, readiness) — composite verdict - FORBIDDEN_OVERCLAIM_TERMS — regex tuple Plus a real-module test that asserts research/systemic_risk/ itself passes the overclaim grep at HYPOTHESIS / INSTRUMENTED tier. Any future commit that introduces forbidden language fails CI. SCOPE-EXPLICIT FALSIFICATION ALIASES - run_score_level_falsification — alias of run_falsification with a name that makes the scope auditable in caller code. - run_end_to_end_falsification — NotImplementedError stub. Fails-closed until the empirical-data ingest and the composed null-audit orchestrator land. No partial pipeline can be misread as end-to-end evidence. TEMPORAL-PANEL BOUNDARY (new module temporal_panel.py) validate_temporal_exposure_panel(panels, node_labels) — fail-closed contract for the eventual end-to-end ingest. Enforces: - non-empty panel - strictly-increasing date keys (no duplicates) - per-snapshot squareness, finiteness, non-negativity - same node universe across snapshots (no silent entry/exit) - label-side contract identical to from_exposure_matrix VALIDATION-MODE BA FIT fit_barabasi_albert_validation_from_topology(topology) — strict wrapper enforcing both n_tail ≥ 50 AND σ_α/α ≤ 0.10 with no escape hatches. DOCS — score-level vs end-to-end boundary made explicit - README.md gains a top-level boundary block stating the executable falsification operates at score-series level and the full pipeline is not yet end-to-end executable. - PROTOCOL.md status string updated to "HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION COMPLETE; END-TO-END VALIDATION PENDING". TESTS — 169 passing (+24 new) - test_governance.py: readiness profile derivation, claim-tier enforcement, overclaim grep on synthetic + real module trees, canonical forbidden-terms list. - test_temporal_panel.py: empty/duplicate/whitespace/None labels, size-mismatch, non-square, NaN, negative — every fail path. - test_falsification.py: scope-alias parity (run_score_level_* matches run_falsification on the same seed), end-to-end stub fails-closed. - test_network_fitting.py: BA validation wrapper rejects small topology, passes on n=3000 BA(m=3, seed=42) with auto-selected k_min ≈ 25, n_tail ≈ 56, σ_α/α ≈ 0.086. Quality - mypy --strict / ruff / black: clean on every new/modified file. - 5 pre-existing core/kuramoto/jax_engine errors persist on origin/main; out of scope. Tier: C-SYSRISK-PHASE remains HYPOTHESIS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce9efc726f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if out_deg.size < MIN_TAIL_SIZE_VALIDATION: | ||
| raise ValueError( | ||
| f"validation-mode BA fit requires " | ||
| f"n_observations >= {MIN_TAIL_SIZE_VALIDATION}; " | ||
| f"got n={out_deg.size}. " |
There was a problem hiding this comment.
Enforce tail-size floor on fitted BA tail, not node count
The validation wrapper claims to require n_tail >= 50, but this guard only checks out_degree.size before fitting. Because fit_barabasi_albert auto-selects k_min, the accepted tail can be much smaller even when N >= 50 (e.g., barabasi_albert_null(n_nodes=60, m=3, seed=56) passes with fit.n_tail=35), so underpowered fits are incorrectly certified as validation-mode.
Useful? React with 👍 / 👎.
| n = len(node_labels) | ||
| if n == 0: | ||
| raise InvalidNodeLabelsError("node_labels must be non-empty") | ||
| sorted_keys = sorted(panels.keys()) |
There was a problem hiding this comment.
Validate panel key types before sorting
sorted(panels.keys()) runs before key-type validation, so a panel containing any non-date key can raise a raw TypeError from Python comparison ('<' not supported...) instead of the documented InvalidTemporalPanelError. This breaks the module’s typed fail-closed boundary contract for malformed temporal panels.
Useful? React with 👍 / 👎.
Summary
Closes the post-merge canonical R&D review on PR #562. Three additions plus four contract-tightening fixes.
What changed
Governance gates (new module
governance.py)assert_claim_tier(claimed, evidence)— refuses promotion beyond available evidence.build_validation_readiness_report(...)— derives the per-axis readiness profile.run_premerge_science_gate(docs_root, readiness)— composite docs-honesty + readiness-consistency verdict.FORBIDDEN_OVERCLAIM_TERMS— regex tuple for the docs-overclaim grep.test_real_module_passes_overclaim_grepasserts the liveresearch/systemic_risk/tree passes the grep atHYPOTHESIS / INSTRUMENTEDtier — any future commit that introduces forbidden language fails CI.Scope-explicit falsification aliases
run_score_level_falsification— alias ofrun_falsificationwith a name that makes scope auditable.run_end_to_end_falsification—NotImplementedErrorstub. Fails-closed until empirical-data ingest and the composed null-audit orchestrator land.Temporal-panel boundary (new module
temporal_panel.py)validate_temporal_exposure_panel(panels, node_labels)— fail-closed contract for the eventual end-to-end ingest. Enforces non-empty panel, strictly-increasing date keys, per-snapshot squareness/finiteness/non-negativity, stable node universe, and the same label-side contract asfrom_exposure_matrix.Validation-mode BA fit
fit_barabasi_albert_validation_from_topology(topology)— strict wrapper enforcing bothn_tail ≥ 50ANDσ_α/α ≤ 0.10, no escape hatches.Docs — score-level vs end-to-end boundary
exposure panel → topology → coupling → Kuramoto → r(t) → score → verdict) is not yet end-to-end executable.HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION COMPLETE; END-TO-END VALIDATION PENDING.PR #562 title renamed (post-merge)
Old:
production-grade v2 ...→ New:R&D hypothesis instrument v2 — directed coupling, MLE BA fit, bootstrap-CI falsification. Per canonical R&D checklist § 2,production-gradeis forbidden language until real-data + replication exist.Test plan
pytest tests/research/systemic_risk/: 169 passed (+24 new).mypy --strictclean on all new/modified files.ruff+blackclean.NotImplementedError.Tier
C-SYSRISK-PHASEremainsHYPOTHESIS. This PR delivers governance instrumentation; it does not advance any claim.🤖 Generated with Claude Code