feat(research/systemic_risk): Protocol X-7 — CSD indicators, naive baselines, extended metrics by neuron7xLab · Pull Request #565 · neuron7xLab/GeoSync

neuron7xLab · 2026-05-08T10:59:47Z

Summary

Score-level instrumentation extension. The status remains HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION EXTENSION ONLY; end-to-end validation remains pending.

What changed

CSD indicators (`critical_slowing_down.py`)

CSDConfig pre-registers window, min_periods, ddof, lag, constant_policy.
compute_csd_indicators returns variance + lag-1 autocorr + skewness + valid_count.
No lookahead. Verified by test_no_lookahead_leakage: mutating a future segment leaves every past indicator value bit-identical.
Skewness implemented inline (no SciPy dependency).
Constant-segment defaults to NaN — propagating undefinedness honestly.

Naive baselines (`baselines.py`)

rolling_volatility_score — pure trailing σ. The "is it just volatility?" challenger.
edge_density_score — per-snapshot directed/undirected edge density. The "is it just topology densification?" challenger.
Both fail-closed on NaN/Inf, non-square / inconsistent N / negative inputs.

Extended metrics (`metrics.py`)

ClassificationMetrics — precision/recall/FPR/FNR with NaN-not-zero on every undefined denominator.
LeadTimeConfig — pre-registered min_lead_days/max_lead_days/event_exclusion_days_after.
LeadTimeMetrics — aggregate; first valid pre-event signal wins.
Same-day signals excluded by default (min_lead_days=1); post-event signals never count.

Audit-grade docs (7 new)

BASELINES.md, METRICS.md, NULL_MODELS.md, FAILURE_MODES.md, REPRODUCIBILITY.md, BOOTSTRAP_PROTOCOL.md, CHANGELOG.md. Every PENDING artefact named with its blocker.

Test plan

pytest tests/research/systemic_risk/: 218 passed (+49 new).
mypy --strict / ruff / black clean on the diff.
test_no_lookahead_leakage — past indicators bit-identical under future-segment mutation.
run_premerge_science_gate against the live tree → passed=True, overclaim_hits=().
C-SYSRISK-PHASE remains HYPOTHESIS in CLAIMS.md.

Allowed final decision

MERGE AS HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION EXTENSION ONLY.

🤖 Generated with Claude Code

…selines, extended metrics Score-level instrumentation extension. Status remains HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION EXTENSION ONLY; end-to-end validation remains pending. CSD INDICATORS (new module critical_slowing_down.py) Variance + lag-1 autocorrelation + skewness over a trailing rolling window. CSDConfig pre-registers window, min_periods, ddof, lag, constant_policy ∈ {nan, zero, raise}. The no-lookahead contract is enforced by a regression test that mutates a future segment of the input and asserts every past indicator value is bit-identical (the load-bearing rail of the X-7 spec). Skewness implemented inline (no SciPy dep). Constant-segment behaviour for autocorr/skewness defaults to NaN — propagating undefinedness honestly rather than faking a "calm" signal with a zero. NAIVE BASELINES (new module baselines.py) rolling_volatility_score — pure trailing-window σ, no phase / coupling / graph. The "is the market just loud?" challenger. edge_density_score — per-snapshot directed / undirected edge density of an adjacency panel; one scalar per timestamp; no dynamics. Defeats the candidate when the apparent signal is topology densification. Both baselines fail-closed on NaN/Inf, non-square / inconsistent-N / negative inputs, and refuse to operate without a valid rolling-window contract. EXTENDED METRICS (new module metrics.py) ClassificationMetrics — TP/FP/TN/FN + precision + recall + FPR + FNR. Every undefined denominator emits NaN, never 0; the absence of denominators must propagate. LeadTimeConfig — pre-registered min/max lead window + optional post-event exclusion buffer. Same-day signals excluded by default (min_lead_days=1); post-event signals never count. LeadTimeMetrics — aggregate over a labelled event set; detected count, sorted lead-time tuple, median + min + max. compute_lead_time_metrics uses the first valid pre-event alarm per event. DOCS — 7 new audit artefacts BASELINES.md — what defeats the candidate, why it matters METRICS.md — AUC alone insufficient; NaN policy stated NULL_MODELS.md — six surrogates + executable status FAILURE_MODES.md — 10 disconfirming experiments to probe REPRODUCIBILITY.md — manifest contract + per-artefact PENDING table BOOTSTRAP_PROTOCOL.md — what is/isn't resampled, seed/convergence CHANGELOG.md — protocol-grade change log; no validated claim TESTS — +49, total 218 passing CSD: rejects 2-D / empty / NaN / Inf; window/min_periods/lag validation; output length contract; insufficient prefix is NaN; valid_count grows; no-lookahead leakage regression; constant-policy nan/zero/raise; zero-variance skew → NaN. Baselines: rolling-volatility no leakage; constant series → 0; window validation; density formulas (directed / undirected / self-edges); panel-N consistency; NaN/negative rejection; single-node density. Metrics: normal case; zero-prediction → NaN precision; zero-positive → NaN recall + NaN FNR; no-negatives → NaN FPR; LeadTimeConfig invariants; pre-event alarm counted; post-event ignored; same-day exclusion at min_lead=1; same-day inclusion at min_lead=0; first valid signal selected; no-signal → undetected. Pre-existing test_governance.py::test_real_module_passes_overclaim_grep passes against the live tree including the 7 new docs + CHANGELOG (overclaim_hits = ()). QUALITY mypy --strict / ruff / black: clean on every new/modified file. Pre-existing 5 jax_engine mypy errors persist on origin/main; out of scope. CLAIM TIER PRESERVED C-SYSRISK-PHASE remains HYPOTHESIS in CLAIMS.md. No measured / validated / confirmed / production claim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9a6765a3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T11:03:31Z

+        if self.ddof < 0:
+            raise ValueError(f"ddof must be >= 0, got {self.ddof}")


Reject invalid ddof values in CSDConfig

CSDConfig only checks ddof >= 0, so configurations like min_periods=5, ddof=8 are accepted even though early rolling windows have N <= ddof; seg.var(ddof=ddof) then emits inf/NaN with runtime warnings, which silently corrupts indicator series after warmup instead of failing closed. Because ddof is user-configurable and pre-registered, this can invalidate experiments without an explicit error unless you also enforce ddof < min_periods (or an equivalent bound tied to the smallest evaluated window).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-08T11:03:31Z

+            if not directed:
+                denom = denom / 2.0


Fix undirected edge-density normalization

In undirected mode, the code keeps counting both A[i,j] and A[j,i] from a symmetric adjacency matrix but divides by N*(N-1)/2, so a fully connected undirected graph returns density 2.0 instead of 1.0. This makes the baseline scale inconsistent and can distort thresholding/comparisons whenever callers pass standard symmetric undirected adjacencies.

Useful? React with 👍 / 👎.

P0-1 — CSDConfig.ddof < min_periods invariant ddof >= min_periods left the rolling variance with 0 dof on the smallest evaluated window → silent NaN. Fail-closed at config construction. Tests: - test_ddof_must_be_less_than_min_periods (rejects ddof=5,mp=5) - test_ddof_less_than_min_periods_accepted - test_ddof_zero_accepted P0-2 — undirected edge density canonical formula Old code summed the full symmetric matrix and produced density = 2.0 for the complete undirected graph (out of [0, 1]). Fixed to read the strict upper triangle (k=1, or k=0 with self-edges) and divide by N*(N-1)/2 (or N*(N+1)/2). Symmetry is now enforced fail-closed under directed=False — a transpose bug raises rather than silently distorting the density scale. Tests: - test_undirected_complete_graph_density_is_one (K3 → 1.0) - test_undirected_requires_symmetric_matrix - test_density_in_unit_interval_for_random_binary (property sweep across N ∈ {3,5,10,20}, p ∈ {0.1,0.3,0.5,0.8}) P1-1 — lead-time strict-increasing dates Old code had no monotonicity check; an unsorted/duplicate dates tuple would silently produce wrong leads. Now raises ValueError. Test: test_dates_must_be_strictly_increasing. P1-2 — lead-time finite threshold Old code accepted threshold = ±Inf and produced NaN in the comparison. Now raises ValueError on non-finite. Test: test_threshold_must_be_finite. P1-3 — explicit score-NaN policy Added allow_warmup_nan=True parameter. Default tolerates a leading contiguous NaN block (rolling-window warmup) but rejects any NaN/Inf past the first finite value. False mode rejects every non-finite value. Tests: - test_warmup_nan_allowed_by_default - test_nan_past_warmup_rejected - test_strict_finite_score_mode P1-4 — removal of dead event_exclusion_days_after API The parameter existed in LeadTimeConfig but had no effect on compute_lead_time_metrics. Removed entirely. Post-event contamination is already prevented by the strict pre-event-only window. A regression test ensures the parameter cannot be silently re-introduced without operationalising it. Test: test_event_exclusion_param_removed. P1-5 — classification metrics input policy Old code did np.asarray(..., dtype=bool) which silently coerced -1, 2, 0.5 etc. to True. Now: bool arrays accepted verbatim; integer arrays must contain only {0, 1}; everything else raises ValueError. Tests: - test_arbitrary_numeric_input_rejected (float input rejected) - test_out_of_range_int_rejected (int 2 rejected) - test_binary_int_input_accepted (int 0/1 works) Tests: 231 passing (+13 from 218). Quality: mypy --strict / ruff / black clean on the diff. Status preserved: HYPOTHESIS / SCORE-LEVEL INSTRUMENTATION EXTENSION ONLY. End-to-end validation remains pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

neuron7xLab merged commit 932306b into main May 8, 2026
19 checks passed

neuron7xLab deleted the feat/systemic-risk-x7-csd branch May 8, 2026 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(research/systemic_risk): Protocol X-7 — CSD indicators, naive baselines, extended metrics#565

feat(research/systemic_risk): Protocol X-7 — CSD indicators, naive baselines, extended metrics#565
neuron7xLab merged 2 commits intomainfrom
feat/systemic-risk-x7-csd

neuron7xLab commented May 8, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if self.ddof < 0:
		raise ValueError(f"ddof must be >= 0, got {self.ddof}")

Conversation

neuron7xLab commented May 8, 2026

Summary

What changed

CSD indicators (critical_slowing_down.py)

Naive baselines (baselines.py)

Extended metrics (metrics.py)

Audit-grade docs (7 new)

Test plan

Allowed final decision

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CSD indicators (`critical_slowing_down.py`)

Naive baselines (`baselines.py`)

Extended metrics (`metrics.py`)