phase 13.5: cognition drift extension#24
Conversation
… tests Adds POST /v1/admin/drift/cognition-baseline/capture and GET /v1/admin/drift/cognition-baseline, both gated on the standard admin auth chain (extract_or_bootstrap + verify_request + require_admin) plus the dedicated can_capture_drift_baseline permission flag (Phase 13.5 Task 3). Mirrors the encryption_admin.py shape from Phase 13.x.7 -- same per-error audit-event granularity, same dict[str, Any] return convention. Baseline file path is overridable via PHOENIX_COGNITION_DRIFT_BASELINE_PATH env var; default lives at ~/.phoenix/runtime/cognition_drift_baseline.json (via CognitionDriftBaseline default). 5 integration tests cover: happy-path capture, alice-403, insufficient-data-409, missing-baseline-404, get-after-capture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer's GuidePhase 13.5 wires cognition drift signals into the existing MLStatisticalChecker using a new cognition feature provider and per-version baseline with weighted-L2 distance, adds admin endpoints and permissions to capture/read baselines, integrates wiring at DriftDetector startup with graceful fallback, and ships an auto-capture helper plus comprehensive tests and changelog updates. Sequence diagram for MLStatisticalChecker cognition baseline pathsequenceDiagram
participant DriftDetector
participant MLStatisticalChecker
participant CognitionFeatureProvider
participant CognitionDriftBaseline
DriftDetector->>MLStatisticalChecker: run
MLStatisticalChecker->>CognitionFeatureProvider: __call
CognitionFeatureProvider-->>MLStatisticalChecker: np_ndarray_features
MLStatisticalChecker->>CognitionDriftBaseline: read_baseline_for_version
alt baseline_missing
MLStatisticalChecker-->>DriftDetector: CheckerResult no_baseline
else baseline_loaded
MLStatisticalChecker->>CognitionDriftBaseline: compute_distance
MLStatisticalChecker-->>DriftDetector: CheckerResult drifting_flag
end
Sequence diagram for admin cognition drift baseline capture endpointsequenceDiagram
participant AdminActor
participant AdminAPI
participant _admin_authn
participant StateBackend
participant CognitionFeatureProvider
participant CognitionDriftBaseline
AdminActor->>AdminAPI: POST /v1/admin/drift/cognition-baseline/capture
AdminAPI->>_admin_authn: _admin_authn
_admin_authn-->>AdminAPI: actor
AdminAPI->>StateBackend: get_state_backend
AdminAPI->>CognitionFeatureProvider: compute
CognitionFeatureProvider-->>AdminAPI: CognitionDriftFeatures_or_None
alt insufficient_data
AdminAPI-->>AdminActor: HTTP 409
else sufficient_data
AdminAPI->>CognitionDriftBaseline: write_current
AdminAPI-->>AdminActor: HTTP 200 baseline_summary
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Both
MLStatisticalChecker._run_with_baselineandmaybe_auto_capture_baselineduplicate the_VECTOR_FIELDS→CognitionDriftFeaturesreconstruction logic; consider extracting a small shared helper to keep the vector/dataclass mapping consistent in one place. - In
capture_cognition_drift_baseline, the audit event reaches intoCognitionFeatureProvider._min_sample_size; if you want to keep that attribute private, consider exposing a read-only property or constant instead of accessing the underscore-prefixed field directly.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Both `MLStatisticalChecker._run_with_baseline` and `maybe_auto_capture_baseline` duplicate the `_VECTOR_FIELDS` → `CognitionDriftFeatures` reconstruction logic; consider extracting a small shared helper to keep the vector/dataclass mapping consistent in one place.
- In `capture_cognition_drift_baseline`, the audit event reaches into `CognitionFeatureProvider._min_sample_size`; if you want to keep that attribute private, consider exposing a read-only property or constant instead of accessing the underscore-prefixed field directly.
## Individual Comments
### Comment 1
<location path="phoenix/admin/cognition_drift_admin.py" line_range="160" />
<code_context>
+ # the grant-prompt-verbatim sibling endpoint family is not currently
+ # wired for this flag (admin-tier construction is the only grant
+ # path in v1.1).
+ perms = get_permissions_registry().get(actor.name)
+ if not perms.can_capture_drift_baseline:
+ emit_admin_audit(
</code_context>
<issue_to_address>
**issue (bug_risk):** Handle missing permission records to avoid attribute access on None.
If `get_permissions_registry().get(actor.name)` returns `None`, `perms.can_capture_drift_baseline` will raise `AttributeError` and return a 500 instead of a 403. Please handle the `None` case explicitly (e.g., default to `ActorPermissions()` or deny with a 403) before accessing `can_capture_drift_baseline`, and adjust the audit event to match the chosen behavior.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # the grant-prompt-verbatim sibling endpoint family is not currently | ||
| # wired for this flag (admin-tier construction is the only grant | ||
| # path in v1.1). | ||
| perms = get_permissions_registry().get(actor.name) |
There was a problem hiding this comment.
issue (bug_risk): Handle missing permission records to avoid attribute access on None.
If get_permissions_registry().get(actor.name) returns None, perms.can_capture_drift_baseline will raise AttributeError and return a 500 instead of a 403. Please handle the None case explicitly (e.g., default to ActorPermissions() or deny with a 403) before accessing can_capture_drift_baseline, and adjust the audit event to match the chosen behavior.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee2ef1d044
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rows = self._backend.list_ledger_entries( | ||
| since_unix=since_unix, limit=self._max_entries_read |
There was a problem hiding this comment.
Sample the latest ledger rows for drift features
When the 24h window contains more than max_entries_read total ledger rows, StateBackend.list_ledger_entries returns the oldest rows first, and cognition filtering happens only after this capped read. In a busy deployment, recent cognition rows can sit past the first 10,000 rows, causing the provider to return insufficient_data or compare stale early-window features while current cognition drift is invisible; fetch enough rows to cover cognition entries or select the most recent rows before computing the vector.
Useful? React with 👍 / 👎.
| # ----- Phase 13.x.7 extension (encryption admin) ----- | ||
| can_rotate_encryption_key: bool = False | ||
| # ----- Phase 13.5 extension (cognition drift baseline) ----- | ||
| can_capture_drift_baseline: bool = False |
There was a problem hiding this comment.
Backfill the new permission for persisted bootstrap actors
For installations that already have actor_permissions.json entries for adam or ash, deserialization fills this newly added field from the dataclass default (False) instead of using the bootstrap default grant, so previously persisted bootstrap admins will get 403 from the new baseline endpoints after upgrade. Add a load-time migration/backfill for missing can_capture_drift_baseline on bootstrap/admin records so the documented bootstrap grant remains true across upgrades.
Useful? React with 👍 / 👎.
Summary
Wires cognition substrate signals (classifier verdict distribution, confidence stats, cognition wobble disagreement, provider behavior, latency, disposition mix) into the existing
phoenix/verification/drift_detector.py::MLStatisticalCheckervia its already-shippedfeature_providercallback seam. Per-Phoenix-version baseline with schema versioning. Two admin endpoints for capture/get. Auto-capture helper for refreshing baseline after N healthy cycles. Decision 17's three-checker aggregation rule preserved (no fourth checker).What ships
phoenix/verification/cognition_drift_features.py—CognitionDriftFeaturesdataclass +CognitionFeatureProvider+_VECTOR_FIELDSordered tuple (single source of truth for vector dimension; module-level assertion catches accidental drift).phoenix/verification/cognition_drift_baseline.py—CognitionDriftBaselinewith per-Phoenix-version JSON storage +FEATURE_SCHEMA_VERSION=1schema versioning + weighted-L2 distance computation.phoenix/admin/cognition_drift_admin.py—POST /v1/admin/drift/cognition-baseline/capture+GET /v1/admin/drift/cognition-baseline. Auth chain mirrors Phase 13.x.7's encryption_admin pattern exactly. 11 granular per-error audit event types.ActorPermissions.can_capture_drift_baseline(default deny; granted to bootstrap actors).MLStatisticalChecker: consumes the baseline via 3 new kwargs (cognition_baseline,phoenix_version,distance_threshold);PHOENIX_DRIFT_COGNITION_DISTANCE_THRESHOLDenv-var overrides 0.5 default. ExistingCheckerResultshape preserved; new reason tokens embedded into the existingsummaryfield.get_detector()builds the cognition provider + baseline + wires them into the ML checker. Graceful fallback to default checker list on wiring failures.maybe_auto_capture_baseline()refreshes baseline after N consecutive healthy cycles. Shipped as a callable; full integration intoDriftDetector.run_cycleis a v1.1.x followup.Privacy contract
Feature provider reads only aggregate fields (verdict, classification, cognition_provenance, cognition_disagreement_metric, prompt_disposition, axis). It does NOT access
prompt_verbatimorprompt_encryptedpayload fields. Whitelist enforced by_extract_aggregate_fields+ pinned by a dedicatedtest_privacy_whitelist_contains_only_expected_fieldstest that asserts the literal whitelist against the approved frozenset.NOT shipped (deferred follow-ups)
maybe_auto_capture_baselineintoDriftDetector.run_cycle(helper is shipped; auto-cycle wiring deferred to v1.1.x followup)ml/drift_ensemble.pyTests added
~32 new across 6 test files:
tests/cognition/test_cognition_drift_features.py(10) — primitive + privacytests/cognition/test_cognition_drift_baseline.py(7) — storage + schema versioning + distancetests/cognition/test_ml_checker_cognition.py(5) — ML checker integrationtests/cognition/test_drift_detector_auto_capture.py(3) — auto-capture helpertests/integration/test_admin_cognition_drift_baseline.py(5) — endpoint integrationtests/unit/test_permissions_phase13_5.py(2) — permission flagProject-wide pytest: 1348 passed, 43 skipped, 0 failures. mypy --strict clean on 5 source files. ruff check + format clean on all 11 touched files.
Spec / plan
docs/superpowers/specs/2026-05-28-phase-13.5-cognition-drift-extension-design.md(a18c0beon main)docs/superpowers/plans/2026-06-05-phase-13.5-cognition-drift-extension.md(2a28738on main)Test plan
pytest tests/cognition/test_cognition_drift_features.py tests/cognition/test_cognition_drift_baseline.py tests/cognition/test_ml_checker_cognition.py tests/cognition/test_drift_detector_auto_capture.py tests/integration/test_admin_cognition_drift_baseline.py tests/unit/test_permissions_phase13_5.py -vall greenmypy --strictclean on the 5 touched modules_VECTOR_FIELDSordering discipline[1.1.0.dev0]below the existing 13.x.4 and 13.x.7 entriesCo-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
Summary by Sourcery
Integrate cognition drift signals into the drift detector by introducing a versioned cognition baseline, wiring it into the ML statistical checker, and exposing admin controls and helpers to manage and auto-capture baselines.
New Features:
Enhancements:
Documentation:
Tests:
Chores: