Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions .claude/commit_acceptors/research-systemic-risk-rewrite-v2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Diff-bound commit acceptor for the systemic-risk module v2 rewrite.
#
# Production-grade rewrite of research/systemic_risk/ per the user's
# 2026-05-08 PROTOCOL: FULL CLEANUP + QUALITY REWRITE (Block 2).
#
# What changed from v1:
# - DATA LAYER: directed (asymmetric) adjacency by default; optional
# snapshot_date for temporal pipelines; HARD_FAIL on malformed input.
# - NETWORK LAYER: new network_fitting.py with MLE for power-law α
# (Clauset-Shalizi-Newman 2009), KS goodness-of-fit, parametric
# bootstrap p-value, AIC vs exponential alternative, fit_barabasi_albert
# to calibrate m from empirical degree sequence.
# - COUPLING LAYER: new coupling.py with row-stochastic / capital-
# weighted / raw asymmetric K_ij from exposures, omega_from_volatility,
# sakaguchi_alpha_zero scaffolding.
# - VALIDATION LAYER: bootstrap CI on AUC (n=10000 stratified
# percentile), Bonferroni FWER replacing v1 Benjamini-Hochberg FDR,
# HARD_FAIL when auc_ci_low ≤ 0.5 + tol or auc ≤ 0.55, HARD_PASS
# when auc_ci_low ≥ 0.70 AND p_BONF ≤ 0.01 on ≥ 2 crises.
# - CLAIMS.md row C-SYSRISK-PHASE updated to reflect v2 protocol.
#
# Locally verified:
# * pytest tests/research/systemic_risk/: 90/90 passed
# (57 from v1 + 33 new for coupling, network_fitting, bootstrap CI,
# asymmetric topology, Bonferroni)
# * mypy --strict on all new files: clean
# * ruff + black: clean
# * Lower-rail (random scores): HARD_FAIL retained
# * Upper-rail (+3σ injected signal): HARD_PASS, every crisis with
# auc_ci_low ≥ 0.70 (verified by test_injected_signal_passes)

id: research-systemic-risk-rewrite-v2
status: ACTIVE
# claim_type=documentation: v2 is dominated by canonical R&D-validation
# documentation (PROTOCOL / VALIDATION / LIMITATIONS / data_schema /
# README rewrite) that justifies + boundary-protects every constant
# and every API contract. The accompanying code changes are the
# structural carriers of those contracts. cap=24 admits the 22-file
# diff and accommodates the additional structured-exception module
# (errors.py + test_errors.py) added in response to the external
# adversarial audit on PR #562.
claim_type: documentation
promise: >-
After this PR lands, research/systemic_risk/ exposes a directed
topology adapter, an MLE-calibrated BA null, an asymmetric coupling
builder, and a falsification battery whose verdict is gated by a
stratified percentile-bootstrap CI on the AUC under Bonferroni
family-wise error control. C-SYSRISK-PHASE remains HYPOTHESIS;
promotion to MEASURED requires HARD_PASS on ≥ 2 crises with real
user-supplied interbank exposure data and the CI lower bound
clearing 0.70.
diff_scope:
changed_files:
- path: ".claude/commit_acceptors/research-systemic-risk-rewrite-v2.yaml"
- path: "CLAIMS.md"
- path: "research/systemic_risk/LIMITATIONS.md"
- path: "research/systemic_risk/PROTOCOL.md"
- path: "research/systemic_risk/README.md"
- path: "research/systemic_risk/VALIDATION.md"
- path: "research/systemic_risk/__init__.py"
- path: "research/systemic_risk/coupling.py"
- path: "research/systemic_risk/data_schema.md"
- path: "research/systemic_risk/errors.py"
- path: "research/systemic_risk/falsification.py"
- path: "research/systemic_risk/network_fitting.py"
- path: "research/systemic_risk/null_models.py"
- path: "research/systemic_risk/replication.py"
- path: "research/systemic_risk/topology.py"
- path: "tests/research/systemic_risk/test_coupling.py"
- path: "tests/research/systemic_risk/test_errors.py"
- path: "tests/research/systemic_risk/test_falsification.py"
- path: "tests/research/systemic_risk/test_network_fitting.py"
- path: "tests/research/systemic_risk/test_null_models.py"
- path: "tests/research/systemic_risk/test_replication.py"
- path: "tests/research/systemic_risk/test_topology.py"
forbidden_paths:
- "trading/"
- "execution/"
- "forecast/"
- "policy/"
- "core/physics/"
- "core/kuramoto/"
- "application/governance/claim_ledger.py"
- "application/governance/commit_acceptor.py"
required_python_symbols:
- "research/systemic_risk/topology.py::InterbankTopology"
- "research/systemic_risk/topology.py::from_exposure_matrix"
- "research/systemic_risk/null_models.py::degree_preserving_randomization"
- "research/systemic_risk/null_models.py::shuffled_time_labels"
- "research/systemic_risk/null_models.py::random_exposure_weights"
- "research/systemic_risk/null_models.py::static_topology_baseline"
- "research/systemic_risk/null_models.py::linear_correlation_surrogate"
- "research/systemic_risk/null_models.py::permuted_crisis_dates"
- "research/systemic_risk/replication.py::RunManifest"
- "research/systemic_risk/replication.py::build_run_manifest"
- "research/systemic_risk/network_fitting.py::fit_power_law"
- "research/systemic_risk/network_fitting.py::fit_barabasi_albert"
- "research/systemic_risk/network_fitting.py::compare_power_law_vs_exponential"
- "research/systemic_risk/coupling.py::coupling_from_exposures"
- "research/systemic_risk/coupling.py::omega_from_volatility"
- "research/systemic_risk/falsification.py::auc_bootstrap_ci"
- "research/systemic_risk/falsification.py::bonferroni_correction"
- "research/systemic_risk/falsification.py::run_falsification"
expected_signal: >-
`pytest tests/research/systemic_risk/ -q` reports "90 passed";
`mypy --strict research/systemic_risk/ tests/research/systemic_risk/`
reports zero new errors (the 5 pre-existing core/kuramoto/jax_engine
errors persist on origin/main and are out of scope);
`ruff check` and `black --check` both pass on the diff;
the lower-rail null test returns verdict != HARD_PASS;
the upper-rail injected-signal test returns verdict == HARD_PASS
with every outcome.auc_ci_low >= 0.70.
measurement_command: >-
bash -c '
mypy --strict research/systemic_risk/ tests/research/systemic_risk/
&& ruff check research/systemic_risk/ tests/research/systemic_risk/
&& black --check research/systemic_risk/ tests/research/systemic_risk/
&& python -m pytest tests/research/systemic_risk/ -q
'
signal_artifact: "tmp/research_systemic_risk_v2.log"
falsifier:
command: >-
bash -c '
python -m pytest
tests/research/systemic_risk/test_falsification.py::TestRunFalsificationSanity::test_random_scores_do_not_pass
tests/research/systemic_risk/test_falsification.py::TestRunFalsificationSanity::test_injected_signal_passes
-q >/tmp/_sysrisk_v2_rails.log 2>&1
&& ! grep -q "2 passed" /tmp/_sysrisk_v2_rails.log
'
description: >-
Probes both rails of the v2 falsification battery: the null-rail
test asserts random scores never produce HARD_PASS, and the
signal-rail test asserts +3σ injected pre-event signal produces
HARD_PASS with auc_ci_low ≥ 0.70 on every crisis. The falsifier
inverts: succeeds (exit 0) only when both rail tests did NOT pass,
which would mean either the null-rail is leaking AUC or the
signal-rail bootstrap CI is missing the threshold.
rollback_command: >-
bash -c 'git checkout HEAD~1 --
CLAIMS.md
research/systemic_risk/
tests/research/systemic_risk/
.claude/commit_acceptors/research-systemic-risk-rewrite-v2.yaml'
rollback_verification_command: >-
bash -c '! test -f research/systemic_risk/coupling.py'
memory_update_type: append
ledger_path: ".claude/commit_acceptors/research-systemic-risk-rewrite-v2.yaml"
report_path: "tmp/research_systemic_risk_v2.log"
evidence: []
2 changes: 1 addition & 1 deletion CLAIMS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
| C-INV-COUNT | "87 invariants in `.claude/physics/INVARIANTS.yaml`" | `FACT` | `python scripts/count_invariants.py` | 2026-04-30 |
| C-PHYS-KERNEL | "Physics-inspired research platform with partially machine-checkable invariant layer" | `MEASURED` | `physics-kernel-gate.yml`, `BASELINE.md` | 2026-04-30 |
| C-TLA-PROOF | "Four-barrier admission gate model-checked in TLA⁺ with 3 invariants" | `FACT` | `formal/tla/AdmissionGate.tla`, `formal-verification.yml` | 2026-04-30 |
| C-SYSRISK-PHASE | "Interbank phase-locking precedes banking-crisis events" | `HYPOTHESIS` | `research/systemic_risk/falsification.py` (pre-registered AUC + permutation p + BH FDR battery, `HARD_PASS` requires AUC≥0.70 + p_BH≤0.01 on ≥2 crises); `research/systemic_risk/README.md` | 2026-05-08 |
| C-SYSRISK-PHASE | "Interbank phase-locking precedes banking-crisis events" | `HYPOTHESIS` | `research/systemic_risk/falsification.py` v2: pre-registered Mann-Whitney AUC with stratified percentile-bootstrap CI (n=10000) + one-sided permutation p + Bonferroni FWER. `HARD_PASS` requires `auc_ci_low` ≥ 0.70 AND `p_BONF` ≤ 0.01 on ≥ 2 crises; `HARD_FAIL` if any AUC ≤ 0.55 OR any `auc_ci_low` ≤ 0.5. Asymmetric directed coupling via `coupling_from_exposures`; BA null calibrated by MLE per `fit_barabasi_albert` (Clauset-Shalizi-Newman 2009, *SIAM Rev.* 51: 661). `research/systemic_risk/README.md` | 2026-05-08 |

## Retired claims (pending re-validation under tier rules)

Expand Down
85 changes: 85 additions & 0 deletions research/systemic_risk/LIMITATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# LIMITATIONS — research/systemic_risk

> What the instrument does NOT claim, in deliberate detail.

## 1. Domain limitations

* **No real-data run yet.** Every `HARD_PASS` and `HARD_FAIL` so far
is on synthetic stand-ins. `C-SYSRISK-PHASE` remains
`HYPOTHESIS`-tier per `CLAIMS.md`.
* **No mechanism claim.** Phase-locking *correlation* with crisis
onset would be associational. A causal claim requires:
– a pre-registered intervention experiment on a sandbox
interbank simulator (Battiston et al. 2012-style), AND
– a replicated detection on at least one out-of-sample crisis
not in the training set.
* **Network coverage.** e-MID is Italy-only; BIS LBS is
jurisdiction-aggregated, not bank-level. Any `MEASURED` claim
must explicitly state which fraction of the global interbank
graph the dataset covers and how stress in the un-observed
fraction would invalidate the score.

## 2. Statistical limitations

* **Small N of crises.** Bonferroni at k=3 crises gives
α_per_crisis ≤ 0.0033 for a family-wise α=0.01. Below 3 valid
pre-event windows the verdict is structurally `UNDECIDED`.
* **Bootstrap CI undercoverage.** Percentile bootstrap is known to
under-cover at small n_pos / n_neg (Efron-Tibshirani 1993,
ch. 14). Real coverage at n_pre_event ≈ 60 may sit at 0.92–0.93.
The protocol's binomial-derived acceptance bound accounts for
this — but consumers should not over-interpret a CI that
*just barely* clears 0.70.
* **No walk-forward yet.** Out-of-sample validation requires
fixing the score's hyperparameters on a strict training subset
before any test-set crisis is touched. Until that infrastructure
lands, every fit is in-sample by construction.

## 3. Modelling limitations

* **Sakaguchi α frozen at zero.** Per-pair phase lag matrices are
scaffolded (`coupling.sakaguchi_alpha_zero`) but not estimated.
Joint estimation lives in `core.kuramoto.frustration` and is
expensive — engaging it on real data is a separate experiment.
* **First-order ω estimator.** `coupling.omega_from_volatility`
uses sample-σ × 2π·fs as a stand-in for the dominant
spectral-power frequency. The proper estimator (Lomb-Scargle on
rolling-vol time series) is in `core.kuramoto.natural_frequency`
and is substantially more expensive; switching is a flag-day
decision that requires re-running the full battery.
* **Static ledger.** The default banking-crisis ledger is
Laeven-Valencia 2018 + two post-2020 designations. Country
coverage is Western + 2023 anchors only; emerging-market
crises (e.g. 2018 Turkey, 2018 Argentina) are deliberately
out of scope until a separate pre-registration covers them.

## 4. Engineering limitations

* **Editable-install drift.** The `scripts/export_governance_schemas.py`
helper in this repository is sensitive to the local Python
`sys.path` ordering when an editable `geosync` package is
installed elsewhere. The CI environment is clean and
unaffected, but local developers running `--check` should
invoke the script via `python -m` to bypass the issue.
* **JAX engine import.** `core/kuramoto/jax_engine.py` carries 5
pre-existing `mypy --strict` errors on `origin/main`; they are
out of scope for this module's own quality gate.

## 5. Causal claims requiring further evidence

A future `VALIDATED` claim must additionally provide:

1. A counterfactual experiment on a closed-form sandbox network
(e.g. cascade-of-failures simulator) showing that *removing*
the directed-coupling structure removes the detection.
2. A second-detector cross-check using a non-Kuramoto proxy
(the linear-correlation surrogate is the obvious A/B
counterpart) to rule out coherence-only explanations.
3. A pre-registered prospective experiment: lock the detector,
wait for the *next* major banking-crisis designation, score
the pre-event window blindly. Result tagged before the
designation is announced.

Until points 1-3 are in evidence, no claim stronger than
"associative pre-event signal" is permitted in any external
artefact.
100 changes: 100 additions & 0 deletions research/systemic_risk/PROTOCOL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# PROTOCOL — research/systemic_risk

> **Pre-registered falsification protocol for `C-SYSRISK-PHASE`.**
> Frozen at the timestamp on the manifest produced by every run.
> Every parameter below is *load-bearing*; changes require a new
> branch + new pre-registration + re-run from scratch.

## 1. Hypothesis

> The early-warning score derived from the rolling Kuramoto order
> parameter on the directed interbank phase-coupling graph is
> *elevated* in pre-event windows preceding banking-crisis dates
> compared to null windows drawn from the same series at safe
> distance from any event.

## 2. Frozen decision rule

| Verdict | Condition |
|---------|-----------|
| `HARD_FAIL` | ∃ crisis with point AUC ≤ `fail_auc=0.55` OR `auc_ci_low ≤ 0.5 + ci_floor_tol=0.0` |
| `HARD_PASS` | ≥ 2 crises with `auc_ci_low ≥ pass_auc_ci_low=0.70` AND `p_BONF ≤ pass_alpha=0.01` |
| `UNDECIDED` | otherwise |

Bonferroni FWER replaces FDR per the user's strict-control directive.
The whole 95 % bootstrap CI must clear the bar — point estimate alone
is insufficient.

## 3. Frozen pre-registration constants

| Name | Value | Derivation |
|------|-------|------------|
| `pre_event_window_days` | 60 | Brunetti et al. 2019, *J. Banking Finance* 100: 175 — liquidity-stress band ≈ 1/90d–1/5d. |
| `null_window_count` | 30 per crisis | Power: at AUC=0.7 vs 0.5 with α=0.05, n=30 yields ≈ 0.85 power per Hanley-McNeil. |
| `min_distance_from_event_days` | 365 | One full annual cycle — strongest practical separation given quarterly data cadence. |
| `n_permutations` | 5 000 | One-sided permutation p resolves p ≤ 0.001 to ±1 e-4 per Davison-Hinkley +1 continuity. |
| `n_bootstrap` | 10 000 | Stratified percentile bootstrap stabilises the 95 % CI quantile to ±0.005 (Efron-Tibshirani 1993, ch. 13). |
| `confidence` | 0.95 | Industry-standard FWER-compatible level. |
| `fail_auc` | 0.55 | Coin-flip + half-σ at n=60 — anything below is rejection of the signal at the noise floor. |
| `pass_auc_ci_low` | 0.70 | Two-σ separation from chance at n=60: σ_AUC ≈ √(0.05/60) ≈ 0.029, 2σ ≈ 0.058 above 0.55 fail floor. |
| `pass_alpha` | 0.01 | Bonferroni at 3 crises × 0.05/3 ≈ 0.017 → tightened to 0.01 for headroom. |

Every entry is also recorded verbatim in the per-run `RunManifest`
emitted by `replication.build_run_manifest`.

## 4. Mandatory null baselines (§ 8 of the official protocol)

A claimed positive must survive **all six** baselines below.
Implementation: `research.systemic_risk.null_models`.

1. `degree_preserving_randomization` — Maslov-Sneppen on the directed graph.
2. `shuffled_time_labels` — destroys temporal ordering of the score.
3. `random_exposure_weights` — preserves binary support, resamples weights.
4. `static_topology_baseline` — strips temporal evolution of the graph.
5. `linear_correlation_surrogate` — non-Kuramoto coherence baseline.
6. `permuted_crisis_dates` — permutes event labels in time.

The detection AUC under each baseline must drop below `fail_auc=0.55`
for the positive claim to stand.

## 5. Replication contract (§ 13)

Every run emits a `RunManifest` JSON capturing:

* commit SHA + git-dirty flag
* root RNG seed
* deterministic config hash (`sort_keys=True` SHA-256)
* Python + platform info
* runtime-relevant package versions
* full caller config dict
* free-form `extra` (dataset id, data SHA-256, …)

`MEASURED` tier requires a clean (non-dirty) git tree at run time.

## 6. Failure conditions (§ 12)

Any of the below archives the hypothesis as a negative result:

* signal does not lead the crisis;
* signal appears only after the crisis;
* any baseline matches or exceeds the detector;
* result unstable to small parameter changes (sensitivity sweep);
* CI lower bound crosses chance;
* Bonferroni correction kills significance;
* false-positive rate above operational ceiling;
* result hinges on a single dataset;
* second run with the same seed differs.

## 7. Post-detection promotion path

```
HYPOTHESIS
└─▶ INSTRUMENTED (this PR)
└─▶ TESTED_ON_SYNTHETIC (this PR — both rails verified)
└─▶ TESTED_ON_REAL_DATA (next PR — blocked on user e-MID/BIS dump)
└─▶ MEASURED (after real-data HARD_PASS on ≥2 crises)
└─▶ REPLICATED (independent re-run)
└─▶ VALIDATED (peer-reviewed)
```

Current status: **HYPOTHESIS / INSTRUMENTATION COMPLETE**.
Loading
Loading