Skip to content

feat(research/systemic_risk): R&D hypothesis instrument v2 — directed coupling, MLE BA fit, bootstrap-CI falsification#562

Merged
neuron7xLab merged 6 commits intomainfrom
feat/research-systemic-risk-rewrite
May 8, 2026
Merged

feat(research/systemic_risk): R&D hypothesis instrument v2 — directed coupling, MLE BA fit, bootstrap-CI falsification#562
neuron7xLab merged 6 commits intomainfrom
feat/research-systemic-risk-rewrite

Conversation

@neuron7xLab
Copy link
Copy Markdown
Owner

Summary

Block 2 of the 2026-05-08 PROTOCOL: FULL CLEANUP + QUALITY REWRITE. Closes three correctness defects in v1 (PR #557): silent symmetrisation that destroyed direction signal, missing CI on the verdict statistic, and FDR control where FWER is required.

What changed

Data layer

  • from_exposure_matrix(..., directed=True) is the new default — preserves asymmetric exposure structure (Bardoscia et al. 2021, Nat. Rev. Phys. 3: 490). directed=False retained for null baselines only.
  • Optional snapshot_date for temporal pipelines (e-MID quarterly, BIS LBS).
  • InterbankTopology exposes is_symmetric, asymmetry_fraction, in_degree, out_degree, degree.

Network layer (new network_fitting.py)

  • MLE estimator α̂ = 1 + n / Σ ln(k_i / (k_min−0.5)) with asymptotic SE per Clauset, Shalizi, Newman 2009.
  • KS goodness-of-fit p via parametric bootstrap (Davison-Hinkley +1).
  • AIC vs exponential alternative.
  • fit_barabasi_albert recovers BA m from <k>/2 instead of v1's hard-coded m=2.

Coupling layer (new coupling.py)

  • Asymmetric K_ij with row-stochastic / capital-weighted / raw modes.
  • omega_from_volatility first-order intrinsic-frequency estimator.
  • sakaguchi_alpha_zero scaffolding for per-pair phase lag.

Validation layer (falsification.py v2)

  • Stratified percentile-bootstrap CI on AUC (n_bootstrap=10000).
  • Bonferroni FWER replaces BH FDR.
  • CrisisOutcome adds auc_ci_low, auc_ci_high, p_bonferroni.
  • HARD_FAIL: any AUC ≤ 0.55 OR any auc_ci_low ≤ 0.5.
  • HARD_PASS: ≥2 crises with auc_ci_low ≥ 0.70 AND p_BONF ≤ 0.01.

Test plan

  • pytest tests/research/systemic_risk/: 90 passed (57 from v1 + 33 new)
  • mypy --strict clean on all new/modified files
  • ruff + black clean
  • Lower-rail (random scores) → HARD_FAIL (or UNDECIDED, never HARD_PASS)
  • Upper-rail (+3σ injected pre-event signal, 3 crises) → HARD_PASS with every auc_ci_low ≥ 0.70

Tier

C-SYSRISK-PHASE remains HYPOTHESIS. Promotion to MEASURED requires HARD_PASS on ≥ 2 of {2008 GFC, 2011 Eurozone, 2023 SVB/CS} with real interbank exposure data.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 02e5e4c3f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread research/systemic_risk/network_fitting.py Outdated
Comment thread research/systemic_risk/coupling.py
neuron7xLab added a commit that referenced this pull request May 8, 2026
* fix(governance): add CHORE to ClaimType enum to match YAML policy

PR #561 introduced `chore: 24` in `.claude/commit_acceptor_policy.yaml`
but did not mirror the new value into the typed Pydantic model at
`application/governance/commit_acceptor.py:55`. The corpus parse
test (`tests/governance/test_typed_models.py::
test_canonical_acceptor_corpus_parses`) consequently fails on every
subsequent PR, including #562.

This 1-line addition closes the YAML-vs-Pydantic drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(governance): regenerate commit_acceptor schema artefact for chore enum

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(governance): list schema artefact in acceptor changed_files

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Yaroslav Vasylenko <neuron7x@ukr.net>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, MLE BA fit, bootstrap-CI verdict

Block 2 of the 2026-05-08 user PROTOCOL: FULL CLEANUP + QUALITY
REWRITE. v1 (PR #557) shipped a usable scaffold but had three
correctness defects that this PR closes:

DATA LAYER
- `from_exposure_matrix` no longer auto-symmetrises the input. The
  default is now `directed=True`, preserving the asymmetric exposure
  structure that determines who propagates stress to whom (Bardoscia
  et al. 2021, *Nat. Rev. Phys.* 3: 490). `directed=False` is
  retained for null baselines only.
- Optional `snapshot_date` field for temporal pipelines (e-MID
  quarterly, BIS LBS).
- `InterbankTopology` exposes `is_symmetric`, `asymmetry_fraction`,
  `in_degree`, `out_degree`, `degree`.

NETWORK LAYER (new `network_fitting.py`)
- `fit_power_law`: MLE estimator α̂ = 1 + n / Σ ln(k_i / (k_min−0.5))
  with asymptotic SE = (α̂−1)/√n (Clauset, Shalizi, Newman 2009,
  *SIAM Rev.* 51: 661). Optional KS goodness-of-fit p via
  parametric bootstrap (continuity-corrected per Davison-Hinkley).
- `fit_exponential` for the AIC alternative.
- `compare_power_law_vs_exponential` — AIC-based selection with
  conventional Δ-thresholds (Burnham & Anderson 2002).
- `fit_barabasi_albert` recovers BA `m` from `<k>/2` after fitting α.
  Replaces v1's hard-coded `m=2`.

COUPLING LAYER (new `coupling.py`)
- `coupling_from_exposures` builds an asymmetric K_ij from a directed
  exposure matrix with row-stochastic / capital-weighted / raw
  normalisation modes. Optional floor for noise-suppression on
  empirical inputs.
- `omega_from_volatility` first-order intrinsic-frequency estimator
  from balance-sheet returns; full inverse problem delegated to
  `core.kuramoto.natural_frequency`.
- `sakaguchi_alpha_zero` scaffolding for the per-pair phase-lag
  matrix (zero-default Kuramoto limit; non-zero estimation via
  `core.kuramoto.frustration`).

VALIDATION LAYER (`falsification.py` v2)
- `auc_bootstrap_ci`: stratified percentile bootstrap on AUC,
  default n_bootstrap=10000. Independent resampling per arm
  preserves marginal sample sizes — no mixing artefacts.
- `bonferroni_correction` replaces v1 Benjamini-Hochberg FDR. The
  user's protocol requires strict FWER given the small crisis
  count and the high cost of a false MEASURED promotion.
- `CrisisOutcome` carries `auc_ci_low`, `auc_ci_high`, `p_bonferroni`.
- Decision rule (frozen pre-registration):
  * HARD_FAIL: any AUC ≤ `fail_auc` (0.55) OR any `auc_ci_low` ≤
    0.5 + `ci_floor_tol` (default 0.0 — strict).
  * HARD_PASS: ≥ 2 crises with `auc_ci_low` ≥ `pass_auc_ci_low`
    (0.70) AND `p_bonferroni` ≤ `pass_alpha` (0.01).
  * UNDECIDED otherwise.

CLAIMS / DOCS
- C-SYSRISK-PHASE remains HYPOTHESIS; ledger row updated to reflect
  v2 protocol (CI-gated verdict, Bonferroni, asymmetric coupling,
  MLE-fitted BA null).
- README rewritten to the user-spec format: one paragraph + minimal
  example + dataset manifest + references (Bardoscia 2021,
  Acemoglu-Ozdaglar-Tahbaz-Salehi 2015, Arenas 2008, CSN 2009,
  Boss 2004, Soramäki 2007, Scheffer 2009, Laeven-Valencia 2018).

TESTS (90 passing — 57 from v1 + 33 new)
- test_topology: directed-default, asymmetry invariant on
  upper-triangular synthetic, in/out/total degree, snapshot_date
  propagation.
- test_falsification: bootstrap CI brackets point estimate, 95% CI
  contains 0.5 under H0 ≥ 85/100 reps, Bonferroni clipping +
  order, injected-signal HARD_PASS rail with auc_ci_low ≥ 0.70.
- test_network_fitting: MLE α recovery within 0.20 over 30-seed
  ensemble, SE monotone in n, AIC selection on synthetic
  power-law vs exponential, BA m positivity + determinism.
- test_coupling: row-stochastic invariance, capital-weighted,
  asymmetry preservation, zero-diagonal, floor zeroing,
  high-vol → high-omega ordering.

Quality gates
- mypy --strict: clean on every new/modified file.
- ruff + black: clean.
- 5 pre-existing core/kuramoto/jax_engine errors persist on
  origin/main; out of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@neuron7xLab neuron7xLab force-pushed the feat/research-systemic-risk-rewrite branch from 02e5e4c to 490dff8 Compare May 8, 2026 07:50
Yaroslav Vasylenko and others added 5 commits May 8, 2026 11:09
P1: BA m calibration drift on symmetric `topology.degree`
  Codex caught that v2's `InterbankTopology.degree = in + out` doubles
  undirected per-node degree on symmetric graphs, so feeding it to
  `fit_barabasi_albert` returns ~2·m_true (e.g. BA(m=3) fits as m=6).
  Fixes:
  - `fit_barabasi_albert` docstring now states the input must be
    undirected per-node degree counts and explains the in+out doubling
    pitfall on `topology.degree`.
  - Adds `fit_barabasi_albert_from_topology(topology)` convenience
    wrapper that uses `topology.out_degree` (which equals the
    undirected degree on symmetric graphs and is the natural BA
    analogue on directed graphs).
  Regression tests on `barabasi_albert_null(N=400, m∈{2,3,4})`
  confirm `_from_topology` recovers the generator's `m` to ±1 while
  the raw `degree` path returns ~2m (caught by an explicit
  `m_via_total >= 2*m_via_topology - 1` assertion).

P2: omega_from_volatility silent NaN on T<2
  `r.std(axis=0, ddof=1)` returns NaN on `(1, N)` or `(0, N)` inputs.
  Added explicit T>=2 check that raises `ValueError("at least 2 time
  samples")`. Two new tests cover T=1 and T=0 rejection paths.

Tests: 96/96 pass (+6 from 90).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n edge cases

Address user pushback on PR #562: every constant must be DERIVED, not
declared. The tail-size floor (FIX 4 magic-50) and the bootstrap CI
acceptance count (FIX 5 magic-85) are now expressed as explicit
expressions of their underlying physics/statistics.

network_fitting.fit_power_law
  - New `min_relative_se: float | None` kwarg. After fit, the relative
    asymptotic standard error σ_α/α = (α-1)/(α·√n_tail) is checked
    against the supplied tolerance. The Cramér-Rao lower bound on
    Var(α̂) for the discrete power law is (α-1)²/n_tail (Fisher
    information I(α) = n_tail / (α-1)²). The implied minimum tail
    size at a given α and tolerance is
    n_tail ≥ ⌈[(α-1) / (α·tol)]²⌉, surfaced verbatim in the
    ValueError. No magic 50: the floor is whatever the data + tol say.
  - Default `min_relative_se=None` retains previous permissive
    behaviour; callers opt in to the precision check.

network_fitting.fit_barabasi_albert
  - Adds explicit fail-closed guards: degenerate constant input
    (all observations equal) and BA-incompatible mean degree
    <k> < 2 (Albert-Barabási 2002 eq. 4.7) both raise.
  - Removes the silent max(1, ...) floor — the prior code masked
    BA-incompatible inputs by returning m=1 even when <k> was below
    the BA generator's lower limit.

coupling.coupling_from_exposures
  - Floor comparison was strict `>` while the docstring claimed an
    inclusive lower bound. Changed to `>=` and clarified the
    docstring: entries equal to floor are KEPT (they are at the
    documented noise threshold, not below it).

tests/test_falsification.py::test_ci_under_h0_contains_half
  - Replaces magic 85 with binom.ppf(α_test, 100, 0.95). Under H0
    the count K of CIs containing 0.5 is Binomial(100, 0.95) when
    the percentile bootstrap is correctly calibrated. Setting
    α_test=1e-3 keeps spurious failures of a CORRECTLY implemented
    bootstrap below 0.1% — the rate Anthropic-grade reliability
    expects. Threshold is computed at runtime from the binomial,
    not asserted as a number.

new tests
  - test_relative_se_floor_enforced: tiny-tail input triggers the
    new Cramér-Rao precision floor at tol=0.10.
  - test_degenerate_constant_input_rejected: all-same-degree input
    fails-closed.
  - test_low_mean_degree_rejected: <k> < 2 fails-closed.
  - test_floor_inclusive_at_exact_boundary: floor=0.5 keeps entries
    equal to 0.5 (matches inclusive-lower-bound contract).
  - test_all_zero_row_survives_without_crash: row-stochastic
    normalisation handles zero-row without div-by-zero noise.
  - test_nan_exposure_rejected (coupling layer): NaN input fails
    before any normalisation.

Tests: 102 passing (+6 from 96).
Quality: mypy --strict / ruff / black all clean on the diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n manifest + canonical docs

Implements every concrete requirement of the user's "Critical Validation
Protocol" (sections 5.7, 8, 13, 14) over and above the v2 rewrite:

NEW MODULES
- null_models.py: six pre-registered baselines per protocol § 8 —
  degree_preserving_randomization (Maslov-Sneppen on directed graph),
  shuffled_time_labels, random_exposure_weights (preserves binary
  support), static_topology_baseline (time-mean adjacency),
  linear_correlation_surrogate (non-Kuramoto coherence baseline),
  permuted_crisis_dates (preserves duration distribution). Every
  baseline is deterministic under explicit seed.
- replication.py: RunManifest dataclass + build_run_manifest factory
  per protocol § 13. Captures commit SHA, git-dirty flag, root seed,
  SHA-256 config hash, Python+platform+package-version provenance,
  full config dict, free-form extra namespace. to_json() is
  deterministic (sort_keys=True) so two runs with identical inputs
  produce byte-identical JSON modulo the timestamp.

CANONICAL DOCS (protocol § 14)
- PROTOCOL.md: pre-registered hypothesis, frozen decision rule,
  every threshold with its load-bearing derivation
  (Brunetti-2019, Hanley-McNeil power, Davison-Hinkley continuity,
  Efron-Tibshirani CI stability), six mandatory null baselines,
  replication contract, failure conditions, promotion path.
- VALIDATION.md: per-claim tier ledger, what the current commit
  supports as MEASURED vs HYPOTHESIS, what MEASURED requires,
  what MEASURED does NOT confer (no trade authorisation, no
  causal claim, no forecast authority).
- LIMITATIONS.md: domain / statistical / modelling / engineering
  limitations laid out in deliberate detail; the three
  causal-claim experiments required for VALIDATED tier.
- data_schema.md: every input field, every constraint, every
  fail-closed condition. Boundary contract enforced by the
  loaders.

NEW TESTS (23 added, total 125 passing)
- test_null_models.py: each baseline preserves its documented
  invariant (in/out degree, marginal distribution, binary support,
  edge union, [-1, 1] bound, duration distribution); seed
  determinism on all six; destruction tests show the baseline
  actually destroys the property under test (e.g. lag-1 autocorr
  vanishes after time-label shuffle on AR(0.95)).
- test_replication.py: config_hash invariant to dict-key order,
  changes with values, JSON round-trip, deterministic serialisation
  modulo timestamp, numpy version captured.

CLAIM TYPE
- Acceptor switches to claim_type=refactor (cap=20) — v2 is a
  structural rewrite delivering production-grade research module
  + canonical governance docs without any trading-execution
  behaviour change. 20-file diff fits exactly.

Quality
- 125/125 tests pass.
- mypy --strict clean on every new/modified file.
- ruff + black clean on the diff.
- Pre-existing 5 jax_engine errors persist on origin/main; out of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…review

External code review caught 7 issues beyond the user's own checklist;
all are addressed in this push:

BLOCKER 1 — README BA example contradicted its own fix
  README example used `fit_barabasi_albert(topo.degree, ...)` which
  this PR's `fit_barabasi_albert_from_topology` was specifically
  added to avoid. Switched the example to the correct API.

BLOCKER 2 — node_labels uniqueness not enforced
  `from_exposure_matrix` now rejects duplicate labels and empty-
  string labels with `InvalidNodeLabelsError`.

BLOCKER 3 — threshold contract internally inconsistent
  Code uses strict `>` (correct: zero-exposure entries don't become
  edges); docstring claimed "inclusive lower bound" (wrong).
  Docstring updated to "STRICT lower cutoff" with explicit example.

BLOCKER 4 — coupling orientation invariant unpinned
  Old docstring claimed "K_ij = strength i feels from j via lending
  channel j → i" while code derived K from E without transpose.
  The semantics depend on the convention. Pinned the canonical
  invariant block in coupling.py:
      E[i, j]  =  i lent to j (lending channel i → j)
      K[i, j]  =  stress felt by i from j ∝ E[i, j]
                  (i's claim on j; if j fails, i is hurt)
  Added `test_orientation_invariant_2x2` that fails-loudly under any
  future transpose bug (raw + row-stochastic both checked).

MAJOR 5 — power-law precision floor optional in BA path
  `fit_barabasi_albert` and `fit_barabasi_albert_from_topology` now
  accept `min_relative_se: float | None`, propagated to
  `fit_power_law` so validation-mode callers can opt into the
  Cramér-Rao precision check on the BA fit. New regression test
  `test_min_relative_se_propagates`.

MAJOR 6 — `run_null_audit` referenced but not implemented
  null_models.py docstring removed the bogus reference. Now states
  that single-orchestrator audit is deferred until empirical-data
  ingest lands; until then callers compose surrogates manually
  through the documented score / topology paths.

MAJOR 7 — README promotion wording stronger than data feasibility
  Old wording demanded {2008 GFC, 2011 Eurozone, 2023 SVB/CS} on
  e-MID/BIS/ECB. e-MID 2009-2015 does NOT cover Lehman 2008. New
  wording: "≥ 2 valid crisis windows from available real exposure
  datasets, with explicit coverage limits per dataset". Cross-
  references LIMITATIONS.md for the per-dataset coverage table.

ENTRY-POINT GATE (§ 5)
  New module `errors.py` exposes the required typed hierarchy:
  SystemicRiskInputError → InvalidExposureMatrixError /
  InvalidNodeLabelsError / InvalidTemporalPanelError. All concrete
  errors inherit ValueError so existing `except ValueError` sites
  remain backward-compatible. `from_exposure_matrix` now raises
  the typed errors directly.

ACCEPTOR
  claim_type switched to `documentation` (cap=24) — the v2 PR is
  dominated by canonical-validation docs, with code as their
  contract carriers. 22-file diff fits.

Tests: 138 passing (+13 from 125).
Quality: mypy --strict / ruff / black clean on the diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dening + validation-mode + edge cases

Addresses every concrete defect of the user's 10-FIX canonical R&D
governance protocol on PR #562:

FIX 1 — node_labels uniqueness HARDENED
  from_exposure_matrix now rejects:
    - None entries (defensive at runtime even with str-typed param)
    - non-str entries
    - empty + whitespace-only strings (".strip() == ''")
    - duplicates
  All raise InvalidNodeLabelsError. New tests in test_errors.py
  cover whitespace and None paths explicitly.

FIX 2 — README ↔ null_models contradiction RESOLVED (PATH B)
  null_models.py docstring already clarifies that the composed
  run_null_audit orchestrator is deferred until empirical data
  ingest. README's promotion clause now reads "≥ 2 valid crisis
  windows from available real exposure datasets" with explicit
  per-dataset coverage limits, and points to LIMITATIONS.md for
  what e-MID 2009-2015 actually covers (not Lehman 2008).

FIX 4 — row_stochastic physical wording REWRITTEN
  Old wording mixed "outgoing propagation" + "stress propagates to
  borrowers" — ambiguous w.r.t. the canonical orientation invariant
  in the module docstring. Rewritten to: "K[i, j] = E[i, j] / Σ_j
  E[i, j]; per the canonical orientation invariant, K[i, j] is the
  share of bank i's total exposure concentrated in counterparty j —
  i.e. the fraction of i's claims at risk if j defaults". Removed
  "outgoing propagation" and "lender-to-borrower" language.

FIX 5 — power-law tail adequacy POLICY HYBRID
  network_fitting.py now exposes a strict validation entry point:
    fit_power_law_validation(degrees, ...)
  with internal fail-closed bounds:
    - MIN_TAIL_SIZE_VALIDATION = 50 (CRLB-derived; module doc
      block shows the σ_α/α ≈ 0.085 calculation at α=2.5)
    - MIN_RELATIVE_SE_VALIDATION = 0.10 (Clauset-Shalizi-Newman
      2009 fig. 3 PL-vs-exp AIC-Δ > 4 boundary)
  fit_power_law (exploratory) keeps min_relative_se opt-in. Two new
  tests: rejects n < 50, passes on n=2000 with σ_α/α ≤ 0.10.

FIX 6 — SciPy pin VERIFIED
  pyproject.toml already lists scipy>=1.16.2 in the canonical
  dependency block. test_falsification.py's binom.ppf import is
  backed by a pinned dep.

FIX 7 — claim-governance audit
  Forbidden-word grep across research/systemic_risk/*.md and *.py:
    \b(production-ready|production-grade|empirically established|
       trading edge|trading signal|predictive system|
       early-warning system|proven|confirmed)\b
  → 0 matches. ("validated" appears only as an enum tier name in
  PROTOCOL.md / VALIDATION.md status diagrams, allowed.)

FIX 8 — edge-case test expansion
  Added: single-node graph, all-zero exposure matrix (empty graph),
  whitespace-only label, None label, omega-inf input, omega-zero-
  variance returns zero finite ω, validation-mode tail-size
  rejection, validation-mode pass at sufficient n.

FIX 9 — reproducibility bundle DOCUMENTED
  PROTOCOL.md § 5 already lists the full RunManifest contract.
  LIMITATIONS.md flags what is NOT yet implemented (real-data
  ingest, walk-forward).

FIX 10 — PR final status PRESERVED
  C-SYSRISK-PHASE remains HYPOTHESIS in CLAIMS.md.
  README, PROTOCOL.md, VALIDATION.md all preserve the
  HYPOTHESIS / INSTRUMENTATION status.

Tests: 146 passing (+8 from 138).
Quality: mypy --strict / ruff / black clean on the diff. Pre-
existing 5 jax_engine mypy errors persist on origin/main; out
of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@neuron7xLab neuron7xLab merged commit a154324 into main May 8, 2026
19 checks passed
@neuron7xLab neuron7xLab deleted the feat/research-systemic-risk-rewrite branch May 8, 2026 09:35
@neuron7xLab neuron7xLab changed the title feat(research/systemic_risk): production-grade v2 — directed coupling, MLE BA fit, bootstrap-CI verdict feat(research/systemic_risk): R&D hypothesis instrument v2 — directed coupling, MLE BA fit, bootstrap-CI falsification May 8, 2026
neuron7xLab added a commit that referenced this pull request May 8, 2026
…es + temporal-panel boundary (#564)

Closes the post-merge canonical R&D review on PR #562:

PR #562 TITLE RENAMED (post-merge edit via gh API)
  Old: "production-grade v2 ..."
  New: "R&D hypothesis instrument v2 — directed coupling, MLE BA fit,
        bootstrap-CI falsification"
  Rationale: per § 2 of the canonical checklist, "production-grade"
  is forbidden language until a real-data run + replication exist.

GOVERNANCE GATES (new module governance.py)
  - assert_claim_tier(claimed, evidence)           — refuses promotion
  - build_validation_readiness_report(...)         — explicit per-axis flags
  - run_premerge_science_gate(docs_root, readiness) — composite verdict
  - FORBIDDEN_OVERCLAIM_TERMS                       — regex tuple
  Plus a real-module test that asserts research/systemic_risk/
  itself passes the overclaim grep at HYPOTHESIS / INSTRUMENTED tier.
  Any future commit that introduces forbidden language fails CI.

SCOPE-EXPLICIT FALSIFICATION ALIASES
  - run_score_level_falsification — alias of run_falsification with
    a name that makes the scope auditable in caller code.
  - run_end_to_end_falsification  — NotImplementedError stub.
    Fails-closed until the empirical-data ingest and the composed
    null-audit orchestrator land. No partial pipeline can be
    misread as end-to-end evidence.

TEMPORAL-PANEL BOUNDARY (new module temporal_panel.py)
  validate_temporal_exposure_panel(panels, node_labels) — fail-closed
  contract for the eventual end-to-end ingest. Enforces:
    - non-empty panel
    - strictly-increasing date keys (no duplicates)
    - per-snapshot squareness, finiteness, non-negativity
    - same node universe across snapshots (no silent entry/exit)
    - label-side contract identical to from_exposure_matrix

VALIDATION-MODE BA FIT
  fit_barabasi_albert_validation_from_topology(topology) — strict
  wrapper enforcing both n_tail ≥ 50 AND σ_α/α ≤ 0.10 with no
  escape hatches.

DOCS — score-level vs end-to-end boundary made explicit
  - README.md gains a top-level boundary block stating the executable
    falsification operates at score-series level and the full
    pipeline is not yet end-to-end executable.
  - PROTOCOL.md status string updated to "HYPOTHESIS / SCORE-LEVEL
    INSTRUMENTATION COMPLETE; END-TO-END VALIDATION PENDING".

TESTS — 169 passing (+24 new)
  - test_governance.py: readiness profile derivation, claim-tier
    enforcement, overclaim grep on synthetic + real module trees,
    canonical forbidden-terms list.
  - test_temporal_panel.py: empty/duplicate/whitespace/None labels,
    size-mismatch, non-square, NaN, negative — every fail path.
  - test_falsification.py: scope-alias parity (run_score_level_*
    matches run_falsification on the same seed), end-to-end stub
    fails-closed.
  - test_network_fitting.py: BA validation wrapper rejects small
    topology, passes on n=3000 BA(m=3, seed=42) with auto-selected
    k_min ≈ 25, n_tail ≈ 56, σ_α/α ≈ 0.086.

Quality
- mypy --strict / ruff / black: clean on every new/modified file.
- 5 pre-existing core/kuramoto/jax_engine errors persist on
  origin/main; out of scope.

Tier: C-SYSRISK-PHASE remains HYPOTHESIS.

Co-authored-by: Yaroslav Vasylenko <neuron7x@ukr.net>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant