Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions .claude/commit_acceptors/systemic-risk-governance-gates.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Diff-bound commit acceptor for the systemic-risk governance gates.
#
# Closes the post-merge review on PR #562 (canonical R&D checklist):
#
# 1. PR #562 title was renamed via gh API to remove "production-grade"
# (post-merge edit; the merge commit message is preserved as
# history but the public-facing title now reads "R&D hypothesis
# instrument v2 — directed coupling, MLE BA fit, bootstrap-CI
# falsification").
# 2. README + PROTOCOL gain explicit score-level scope boundary —
# the executable falsification operates on a pre-computed score
# series; the full exposure → verdict pipeline is not yet
# end-to-end executable.
# 3. New module governance.py exposes assert_claim_tier,
# build_validation_readiness_report, run_premerge_science_gate.
# The grep gate scans md/py for FORBIDDEN_OVERCLAIM_TERMS and
# fails-closed on any hit. A test asserts the real
# research/systemic_risk/ tree passes the grep at HYPOTHESIS /
# INSTRUMENTED tier.
# 4. New module temporal_panel.py with validate_temporal_exposure_panel
# fail-closed boundary contract for the eventual end-to-end ingest.
# 5. falsification.py exposes scope-explicit aliases:
# run_score_level_falsification — alias of run_falsification
# run_end_to_end_falsification — NotImplementedError stub
# 6. network_fitting.py adds fit_barabasi_albert_validation_from_topology
# — strict wrapper enforcing both n_tail ≥ 50 and σ_α/α ≤ 0.10.
# 7. Tests cover every new path (24 new + 169 total passing).
#
# Locally verified:
# * pytest tests/research/systemic_risk/: 169/169 pass
# * mypy --strict on every new file: clean
# * ruff + black: clean

id: systemic-risk-governance-gates
status: ACTIVE
claim_type: governance
promise: >-
After this PR lands, the systemic-risk module ships
machine-checked governance gates — assert_claim_tier,
build_validation_readiness_report, run_premerge_science_gate —
that prevent any future commit from overclaiming beyond the
available evidence, plus a fail-closed temporal-exposure-panel
boundary contract and explicit score-level vs end-to-end
falsification scope tags. C-SYSRISK-PHASE remains HYPOTHESIS
per CLAIMS.md.
diff_scope:
changed_files:
- path: ".claude/commit_acceptors/systemic-risk-governance-gates.yaml"
- path: "research/systemic_risk/PROTOCOL.md"
- path: "research/systemic_risk/README.md"
- path: "research/systemic_risk/__init__.py"
- path: "research/systemic_risk/falsification.py"
- path: "research/systemic_risk/governance.py"
- path: "research/systemic_risk/network_fitting.py"
- path: "research/systemic_risk/temporal_panel.py"
- path: "tests/research/systemic_risk/test_falsification.py"
- path: "tests/research/systemic_risk/test_governance.py"
- path: "tests/research/systemic_risk/test_network_fitting.py"
- path: "tests/research/systemic_risk/test_temporal_panel.py"
forbidden_paths:
- "trading/"
- "execution/"
- "forecast/"
- "policy/"
- "core/physics/"
- "core/kuramoto/"
- "application/governance/claim_ledger.py"
- "application/governance/commit_acceptor.py"
required_python_symbols:
- "research/systemic_risk/governance.py::assert_claim_tier"
- "research/systemic_risk/governance.py::build_validation_readiness_report"
- "research/systemic_risk/governance.py::run_premerge_science_gate"
- "research/systemic_risk/governance.py::FORBIDDEN_OVERCLAIM_TERMS"
- "research/systemic_risk/temporal_panel.py::validate_temporal_exposure_panel"
- "research/systemic_risk/falsification.py::run_score_level_falsification"
- "research/systemic_risk/falsification.py::run_end_to_end_falsification"
- "research/systemic_risk/network_fitting.py::fit_barabasi_albert_validation_from_topology"
expected_signal: >-
`pytest tests/research/systemic_risk/` reports "169 passed";
`mypy --strict research/systemic_risk/ tests/research/systemic_risk/`
is clean (the 5 pre-existing core/kuramoto/jax_engine errors
persist on origin/main and are out of scope); `ruff check` and
`black --check` both pass on the diff;
`run_premerge_science_gate(docs_root=research/systemic_risk/)`
returns passed=True with overclaim_hits=().
measurement_command: >-
bash -c '
mypy --strict research/systemic_risk/ tests/research/systemic_risk/
&& ruff check research/systemic_risk/ tests/research/systemic_risk/
&& black --check research/systemic_risk/ tests/research/systemic_risk/
&& python -m pytest tests/research/systemic_risk/ -q
'
signal_artifact: "tmp/systemic_risk_governance_gates.log"
falsifier:
command: >-
bash -c '
python -m pytest
tests/research/systemic_risk/test_governance.py::TestRunPremergeScienceGate::test_real_module_passes_overclaim_grep
tests/research/systemic_risk/test_falsification.py::TestScopeExplicitAliases::test_end_to_end_falsification_fails_closed
-q >/tmp/_governance_rails.log 2>&1
&& ! grep -q "2 passed" /tmp/_governance_rails.log
'
description: >-
Probes the two load-bearing rails of the governance layer: the
overclaim grep against the real module tree, and the end-to-end
fail-closed stub. The falsifier inverts: it succeeds only when
both rail tests did NOT pass, which would mean either the
overclaim grep is leaking forbidden language or the end-to-end
stub is silently running a partial pipeline.
rollback_command: >-
bash -c 'git checkout HEAD~1 --
research/systemic_risk/PROTOCOL.md
research/systemic_risk/README.md
research/systemic_risk/__init__.py
research/systemic_risk/falsification.py
research/systemic_risk/network_fitting.py
&& rm -f
research/systemic_risk/governance.py
research/systemic_risk/temporal_panel.py
tests/research/systemic_risk/test_governance.py
tests/research/systemic_risk/test_temporal_panel.py
.claude/commit_acceptors/systemic-risk-governance-gates.yaml'
rollback_verification_command: >-
bash -c '! test -f research/systemic_risk/governance.py'
memory_update_type: append
ledger_path: ".claude/commit_acceptors/systemic-risk-governance-gates.yaml"
report_path: "tmp/systemic_risk_governance_gates.log"
evidence: []
11 changes: 10 additions & 1 deletion research/systemic_risk/PROTOCOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,4 +97,13 @@ HYPOTHESIS
└─▶ VALIDATED (peer-reviewed)
```

Current status: **HYPOTHESIS / INSTRUMENTATION COMPLETE**.
Current status: **HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION COMPLETE; END-TO-END VALIDATION PENDING**.

The pre-registered falsification battery operates on a *score
series*. The full pipeline — temporal exposure panel → topology →
coupling → Kuramoto dynamics → r(t) → early-warning score → verdict
— is not yet end-to-end executable. The composed null-audit
orchestrator (`null_models.run_null_audit`) is documented as
deferred until empirical temporal-exposure ingest lands; promotion
gates beyond `INSTRUMENTED + TESTED_ON_SYNTHETIC` therefore cannot
fire from the current main.
22 changes: 22 additions & 0 deletions research/systemic_risk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,28 @@
> **Tier (per `CLAIMS.md`):** `HYPOTHESIS` until the v2 falsification
> battery returns `HARD_PASS` on ≥ 2 independent crises with real
> interbank exposure data and the bootstrap-CI lower bound clears 0.70.
>
> **Scope of the current executable falsification — score-level only.**
>
> The instrument tests:
>
> ```
> score(t) → crisis-window statistical evaluation
> ```
>
> It does **not** yet validate the full end-to-end pipeline:
>
> ```
> temporal exposure panel → topology → coupling → Kuramoto dynamics
> → r(t) → early-warning score → crisis verdict
> ```
>
> End-to-end validation requires empirical temporal exposure ingest,
> locked score construction, executable null-audit orchestration,
> reproducibility manifest, and real-data runs. None of those have
> happened yet — see `LIMITATIONS.md` § "Domain limitations" and
> `governance.run_premerge_science_gate` for the machine-checked
> readiness profile.

## What this does

Expand Down
34 changes: 34 additions & 0 deletions research/systemic_risk/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@
compute_early_warning,
kuramoto_order_parameter,
)
from .errors import (
InvalidExposureMatrixError,
InvalidNodeLabelsError,
InvalidTemporalPanelError,
SystemicRiskInputError,
)
from .event_ledger import (
DEFAULT_LEDGER,
BankingCrisisEvent,
Expand All @@ -38,7 +44,17 @@
auc_bootstrap_ci,
auc_mann_whitney,
bonferroni_correction,
run_end_to_end_falsification,
run_falsification,
run_score_level_falsification,
)
from .governance import (
FORBIDDEN_OVERCLAIM_TERMS,
PremergeGateReport,
ValidationReadinessReport,
assert_claim_tier,
build_validation_readiness_report,
run_premerge_science_gate,
)
from .network_fitting import (
MIN_RELATIVE_SE_VALIDATION,
Expand All @@ -49,6 +65,7 @@
compare_power_law_vs_exponential,
fit_barabasi_albert,
fit_barabasi_albert_from_topology,
fit_barabasi_albert_validation_from_topology,
fit_exponential,
fit_power_law,
fit_power_law_validation,
Expand All @@ -70,6 +87,9 @@
RunManifest,
build_run_manifest,
)
from .temporal_panel import (
validate_temporal_exposure_panel,
)
from .topology import (
InterbankTopology,
barabasi_albert_null,
Expand All @@ -84,27 +104,37 @@
"EarlyWarningConfig",
"EarlyWarningResult",
"ExponentialFit",
"FORBIDDEN_OVERCLAIM_TERMS",
"FalsificationConfig",
"FalsificationReport",
"INTERBANK_DEFAULT_BAND",
"InterbankTopology",
"InvalidExposureMatrixError",
"InvalidNodeLabelsError",
"InvalidTemporalPanelError",
"MIN_RELATIVE_SE_VALIDATION",
"MIN_TAIL_SIZE_VALIDATION",
"ModelComparison",
"NullSurrogate",
"PowerLawFit",
"PremergeGateReport",
"RunManifest",
"SystemicRiskInputError",
"ValidationReadinessReport",
"assert_claim_tier",
"auc_bootstrap_ci",
"auc_mann_whitney",
"barabasi_albert_null",
"bonferroni_correction",
"build_run_manifest",
"build_validation_readiness_report",
"compare_power_law_vs_exponential",
"compute_early_warning",
"coupling_from_exposures",
"degree_preserving_randomization",
"fit_barabasi_albert",
"fit_barabasi_albert_from_topology",
"fit_barabasi_albert_validation_from_topology",
"fit_exponential",
"fit_power_law",
"fit_power_law_validation",
Expand All @@ -115,8 +145,12 @@
"omega_from_volatility",
"permuted_crisis_dates",
"random_exposure_weights",
"run_end_to_end_falsification",
"run_falsification",
"run_premerge_science_gate",
"run_score_level_falsification",
"sakaguchi_alpha_zero",
"shuffled_time_labels",
"static_topology_baseline",
"validate_temporal_exposure_panel",
]
53 changes: 53 additions & 0 deletions research/systemic_risk/falsification.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@
"auc_bootstrap_ci",
"bonferroni_correction",
"run_falsification",
"run_score_level_falsification",
"run_end_to_end_falsification",
]


Expand Down Expand Up @@ -470,3 +472,54 @@ def run_falsification(
verdict = "HARD_PASS" if len(passing) >= 2 else "UNDECIDED"

return FalsificationReport(outcomes=finalised, verdict=verdict, config=cfg)


# ---------------------------------------------------------------------------
# Scope-explicit aliases — make the validation boundary auditable
# ---------------------------------------------------------------------------


def run_score_level_falsification(
score: NDArray[np.float64],
dates: tuple[date, ...],
ledger: BankingCrisisLedger,
*,
config: FalsificationConfig | None = None,
country_filter: str | None = None,
) -> FalsificationReport:
"""Score-level alias of :func:`run_falsification` — explicit scope tag.

Identical behaviour to :func:`run_falsification`. The dedicated
name makes the *scope* of the test auditable in caller code:
this function evaluates a pre-computed score series; it does
NOT validate the upstream pipeline that produced the score.
For end-to-end (exposure → verdict) validation see
:func:`run_end_to_end_falsification`.
"""
return run_falsification(score, dates, ledger, config=config, country_filter=country_filter)


def run_end_to_end_falsification(
*args: object,
**kwargs: object,
) -> FalsificationReport:
"""End-to-end falsification — NOT YET IMPLEMENTED.

The full pipeline — temporal exposure panel → topology →
coupling → Kuramoto dynamics → r(t) → early-warning score →
crisis verdict — requires real-data ingest and an executable
null-audit orchestrator, neither of which has landed on
``main`` (see ``LIMITATIONS.md`` § "Domain limitations" and
``null_models.py`` module docstring).

Calling this function fails-closed via
:class:`NotImplementedError` rather than running a partial
pipeline that could be misread as end-to-end evidence.
"""
raise NotImplementedError(
"End-to-end falsification (exposure panel → verdict) is not "
"yet implemented on main. The composed null-audit orchestrator "
"and temporal-exposure ingest are both deferred — see "
"research/systemic_risk/LIMITATIONS.md and PROTOCOL.md § 4. "
"For score-level evaluation use run_score_level_falsification."
)
Loading
Loading