Skip to content

feat(research/systemic_risk): Protocol X-7 — CSD indicators, naive baselines, extended metrics#565

Merged
neuron7xLab merged 2 commits intomainfrom
feat/systemic-risk-x7-csd
May 8, 2026
Merged

feat(research/systemic_risk): Protocol X-7 — CSD indicators, naive baselines, extended metrics#565
neuron7xLab merged 2 commits intomainfrom
feat/systemic-risk-x7-csd

Conversation

@neuron7xLab
Copy link
Copy Markdown
Owner

Summary

Score-level instrumentation extension. The status remains HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION EXTENSION ONLY; end-to-end validation remains pending.

What changed

CSD indicators (critical_slowing_down.py)

  • CSDConfig pre-registers window, min_periods, ddof, lag, constant_policy.
  • compute_csd_indicators returns variance + lag-1 autocorr + skewness + valid_count.
  • No lookahead. Verified by test_no_lookahead_leakage: mutating a future segment leaves every past indicator value bit-identical.
  • Skewness implemented inline (no SciPy dependency).
  • Constant-segment defaults to NaN — propagating undefinedness honestly.

Naive baselines (baselines.py)

  • rolling_volatility_score — pure trailing σ. The "is it just volatility?" challenger.
  • edge_density_score — per-snapshot directed/undirected edge density. The "is it just topology densification?" challenger.
  • Both fail-closed on NaN/Inf, non-square / inconsistent N / negative inputs.

Extended metrics (metrics.py)

  • ClassificationMetrics — precision/recall/FPR/FNR with NaN-not-zero on every undefined denominator.
  • LeadTimeConfig — pre-registered min_lead_days/max_lead_days/event_exclusion_days_after.
  • LeadTimeMetrics — aggregate; first valid pre-event signal wins.
  • Same-day signals excluded by default (min_lead_days=1); post-event signals never count.

Audit-grade docs (7 new)

BASELINES.md, METRICS.md, NULL_MODELS.md, FAILURE_MODES.md, REPRODUCIBILITY.md, BOOTSTRAP_PROTOCOL.md, CHANGELOG.md. Every PENDING artefact named with its blocker.

Test plan

  • pytest tests/research/systemic_risk/: 218 passed (+49 new).
  • mypy --strict / ruff / black clean on the diff.
  • test_no_lookahead_leakage — past indicators bit-identical under future-segment mutation.
  • run_premerge_science_gate against the live tree → passed=True, overclaim_hits=().
  • C-SYSRISK-PHASE remains HYPOTHESIS in CLAIMS.md.

Allowed final decision

MERGE AS HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION EXTENSION ONLY.

🤖 Generated with Claude Code

…selines, extended metrics

Score-level instrumentation extension. Status remains
HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION EXTENSION ONLY;
end-to-end validation remains pending.

CSD INDICATORS (new module critical_slowing_down.py)
  Variance + lag-1 autocorrelation + skewness over a trailing
  rolling window. CSDConfig pre-registers window, min_periods,
  ddof, lag, constant_policy ∈ {nan, zero, raise}. The
  no-lookahead contract is enforced by a regression test that
  mutates a future segment of the input and asserts every past
  indicator value is bit-identical (the load-bearing rail of
  the X-7 spec). Skewness implemented inline (no SciPy dep).
  Constant-segment behaviour for autocorr/skewness defaults to
  NaN — propagating undefinedness honestly rather than faking
  a "calm" signal with a zero.

NAIVE BASELINES (new module baselines.py)
  rolling_volatility_score — pure trailing-window σ, no phase /
  coupling / graph. The "is the market just loud?" challenger.
  edge_density_score — per-snapshot directed / undirected edge
  density of an adjacency panel; one scalar per timestamp; no
  dynamics. Defeats the candidate when the apparent signal is
  topology densification.
  Both baselines fail-closed on NaN/Inf, non-square / inconsistent-N
  / negative inputs, and refuse to operate without a valid
  rolling-window contract.

EXTENDED METRICS (new module metrics.py)
  ClassificationMetrics — TP/FP/TN/FN + precision + recall + FPR
  + FNR. Every undefined denominator emits NaN, never 0; the
  absence of denominators must propagate.
  LeadTimeConfig — pre-registered min/max lead window + optional
  post-event exclusion buffer. Same-day signals excluded by
  default (min_lead_days=1); post-event signals never count.
  LeadTimeMetrics — aggregate over a labelled event set;
  detected count, sorted lead-time tuple, median + min + max.
  compute_lead_time_metrics uses the first valid pre-event alarm
  per event.

DOCS — 7 new audit artefacts
  BASELINES.md         — what defeats the candidate, why it matters
  METRICS.md           — AUC alone insufficient; NaN policy stated
  NULL_MODELS.md       — six surrogates + executable status
  FAILURE_MODES.md     — 10 disconfirming experiments to probe
  REPRODUCIBILITY.md   — manifest contract + per-artefact PENDING table
  BOOTSTRAP_PROTOCOL.md — what is/isn't resampled, seed/convergence
  CHANGELOG.md         — protocol-grade change log; no validated claim

TESTS — +49, total 218 passing
  CSD: rejects 2-D / empty / NaN / Inf; window/min_periods/lag
       validation; output length contract; insufficient prefix is
       NaN; valid_count grows; no-lookahead leakage regression;
       constant-policy nan/zero/raise; zero-variance skew → NaN.
  Baselines: rolling-volatility no leakage; constant series → 0;
       window validation; density formulas (directed / undirected /
       self-edges); panel-N consistency; NaN/negative rejection;
       single-node density.
  Metrics: normal case; zero-prediction → NaN precision;
       zero-positive → NaN recall + NaN FNR; no-negatives → NaN FPR;
       LeadTimeConfig invariants; pre-event alarm counted;
       post-event ignored; same-day exclusion at min_lead=1;
       same-day inclusion at min_lead=0; first valid signal
       selected; no-signal → undetected.
  Pre-existing test_governance.py::test_real_module_passes_overclaim_grep
  passes against the live tree including the 7 new docs +
  CHANGELOG (overclaim_hits = ()).

QUALITY
  mypy --strict / ruff / black: clean on every new/modified file.
  Pre-existing 5 jax_engine mypy errors persist on origin/main;
  out of scope.

CLAIM TIER PRESERVED
  C-SYSRISK-PHASE remains HYPOTHESIS in CLAIMS.md.
  No measured / validated / confirmed / production claim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9a6765a3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +96 to +97
if self.ddof < 0:
raise ValueError(f"ddof must be >= 0, got {self.ddof}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject invalid ddof values in CSDConfig

CSDConfig only checks ddof >= 0, so configurations like min_periods=5, ddof=8 are accepted even though early rolling windows have N <= ddof; seg.var(ddof=ddof) then emits inf/NaN with runtime warnings, which silently corrupts indicator series after warmup instead of failing closed. Because ddof is user-configurable and pre-registered, this can invalidate experiments without an explicit error unless you also enforce ddof < min_periods (or an equivalent bound tied to the smallest evaluated window).

Useful? React with 👍 / 👎.

Comment thread research/systemic_risk/baselines.py Outdated
Comment on lines +146 to +147
if not directed:
denom = denom / 2.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix undirected edge-density normalization

In undirected mode, the code keeps counting both A[i,j] and A[j,i] from a symmetric adjacency matrix but divides by N*(N-1)/2, so a fully connected undirected graph returns density 2.0 instead of 1.0. This makes the baseline scale inconsistent and can distort thresholding/comparisons whenever callers pass standard symmetric undirected adjacencies.

Useful? React with 👍 / 👎.

P0-1 — CSDConfig.ddof < min_periods invariant
  ddof >= min_periods left the rolling variance with 0 dof on the
  smallest evaluated window → silent NaN. Fail-closed at config
  construction. Tests:
  - test_ddof_must_be_less_than_min_periods (rejects ddof=5,mp=5)
  - test_ddof_less_than_min_periods_accepted
  - test_ddof_zero_accepted

P0-2 — undirected edge density canonical formula
  Old code summed the full symmetric matrix and produced density
  = 2.0 for the complete undirected graph (out of [0, 1]). Fixed
  to read the strict upper triangle (k=1, or k=0 with self-edges)
  and divide by N*(N-1)/2 (or N*(N+1)/2). Symmetry is now enforced
  fail-closed under directed=False — a transpose bug raises
  rather than silently distorting the density scale. Tests:
  - test_undirected_complete_graph_density_is_one (K3 → 1.0)
  - test_undirected_requires_symmetric_matrix
  - test_density_in_unit_interval_for_random_binary
    (property sweep across N ∈ {3,5,10,20}, p ∈ {0.1,0.3,0.5,0.8})

P1-1 — lead-time strict-increasing dates
  Old code had no monotonicity check; an unsorted/duplicate dates
  tuple would silently produce wrong leads. Now raises ValueError.
  Test: test_dates_must_be_strictly_increasing.

P1-2 — lead-time finite threshold
  Old code accepted threshold = ±Inf and produced NaN in the
  comparison. Now raises ValueError on non-finite.
  Test: test_threshold_must_be_finite.

P1-3 — explicit score-NaN policy
  Added allow_warmup_nan=True parameter. Default tolerates a
  leading contiguous NaN block (rolling-window warmup) but
  rejects any NaN/Inf past the first finite value. False mode
  rejects every non-finite value. Tests:
  - test_warmup_nan_allowed_by_default
  - test_nan_past_warmup_rejected
  - test_strict_finite_score_mode

P1-4 — removal of dead event_exclusion_days_after API
  The parameter existed in LeadTimeConfig but had no effect on
  compute_lead_time_metrics. Removed entirely. Post-event
  contamination is already prevented by the strict
  pre-event-only window. A regression test ensures the
  parameter cannot be silently re-introduced without
  operationalising it. Test: test_event_exclusion_param_removed.

P1-5 — classification metrics input policy
  Old code did np.asarray(..., dtype=bool) which silently
  coerced -1, 2, 0.5 etc. to True. Now: bool arrays accepted
  verbatim; integer arrays must contain only {0, 1}; everything
  else raises ValueError. Tests:
  - test_arbitrary_numeric_input_rejected (float input rejected)
  - test_out_of_range_int_rejected (int 2 rejected)
  - test_binary_int_input_accepted (int 0/1 works)

Tests: 231 passing (+13 from 218).
Quality: mypy --strict / ruff / black clean on the diff.

Status preserved: HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION
EXTENSION ONLY. End-to-end validation remains pending.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@neuron7xLab neuron7xLab merged commit 932306b into main May 8, 2026
19 checks passed
@neuron7xLab neuron7xLab deleted the feat/systemic-risk-x7-csd branch May 8, 2026 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant