fix: drop legacy unique content_hash index#245
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughThis PR consolidates environment-based configuration into a shared Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@codex review |
|
@coderabbitai review |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed697058b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| reranked = [mmr_candidates[idx] for idx in selected] | ||
| return reranked + fallback_candidates + tail_candidates |
There was a problem hiding this comment.
Keep non-vector hits in score order after MMR rerank
Chunks can legitimately be missing chunk_vectors rows (for example, newly stored manual/MCP memories before embedding backfill), so they enter fallback_candidates even when they have the highest hybrid score. Returning reranked + fallback_candidates + tail_candidates always pushes those lexical-only hits behind all vector-backed MMR candidates, so hybrid_search(..., n_results=...) can drop the most relevant FTS matches from the top results. This is a retrieval-correctness regression for partial-embedding databases.
Useful? React with 👍 / 👎.
| from .store_handler import _brain_update as _brain_update | ||
| from .tags_handler import _brain_tags_mcp as _brain_tags_mcp | ||
|
|
||
| DEFAULT_REALTIME_ENRICH_SINCE_HOURS = int(os.environ.get("BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS", "8760")) |
There was a problem hiding this comment.
Guard enrich default env parsing at module import
DEFAULT_REALTIME_ENRICH_SINCE_HOURS is parsed with int(...) at import time, so any malformed BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS value (e.g. empty string or 24h) raises ValueError and prevents the MCP module from loading at all. Because this happens before request handling, one bad environment value can take the whole tool server down instead of falling back to a safe default.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ed69705. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e37754d7f6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except Exception: | ||
| pass | ||
|
|
||
| return True |
There was a problem hiding this comment.
Propagate content_hash index migration failures
Do not swallow exceptions here: if DROP INDEX/CREATE INDEX fails (for example due a transient database is locked during startup), _ensure_content_hash_column still returns success and _ensure_enrichment_columns marks the store as schema-ready, so the process will not retry the migration and legacy unique content_hash constraints can continue breaking duplicate-hash backfill/write paths. This makes the new repair logic fail silently in exactly the lock-contention environment called out in this repo.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/brainlayer/config.py`:
- Around line 21-25: The current int parsing returns int(value) even for 0 or
negatives; update the parse so that after converting value = int(value) you
reject non-positive numbers by checking if value <= 0, then call
logger.warning("Invalid %s=%r; using default %s", name, raw, default) and return
default; otherwise return the parsed positive integer. Locate the try/except
block that uses name, raw, default and replace the bare return int(value) with
this validation (keep the same warning message and exception handling).
In `@src/brainlayer/enrichment_controller.py`:
- Around line 402-422: The current try/except around the index scan/drop/create
swallows all exceptions and returns True even when DDL fails; update the logic
in the function containing this block (the code using cursor.execute,
has_content_hash_index, DROP INDEX and CREATE INDEX) to remove the bare
except/pass, implement retry-on-SQLITE_BUSY for each DDL operation (PRAGMA
index_info, DROP INDEX, CREATE INDEX) with a small backoff and limited attempts,
and if retries exhaust or any non-BUSY error occurs, propagate or return a
failure (do not return True); ensure the final return reflects actual success
only after successful DDL or confirmed existing index.
- Around line 405-414: The migration loop currently drops any unique index that
contains "content_hash" even if it's part of a composite index; change the logic
in the loop that iterates over indexes (using variables index_name, is_unique,
quoted_name, columns and cursor.execute/PRAGMA index_info) to only DROP INDEX
when the index is unique and its columns exactly equal ["content_hash"]
(preferably also verify the index_name matches the known legacy name if
available) — otherwise skip it so composite indexes like (project, content_hash)
or (conversation_id, content_hash) are preserved.
In `@tests/test_enrich_defaults.py`:
- Around line 24-28: The test currently calls
monkeypatch.delenv("BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS", raising=False) then
importlib.reload(config), importlib.reload(cli), importlib.reload(mcp),
importlib.reload(enrich_handler), which can leak state if the env var was set
before the test; update the test cleanup in tests/test_enrich_defaults.py to
capture the original value (e.g., orig =
os.environ.get("BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS")) before deleting, then
restore that original value (use monkeypatch.setenv if orig is not None,
otherwise monkeypatch.delenv) and only then call importlib.reload(config),
importlib.reload(cli), importlib.reload(mcp), importlib.reload(enrich_handler)
so module constants reload with the original environment baseline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dc8e29d2-3ca5-453f-ba28-ecefa273d309
📒 Files selected for processing (9)
src/brainlayer/cli/__init__.pysrc/brainlayer/config.pysrc/brainlayer/enrichment_controller.pysrc/brainlayer/mcp/__init__.pysrc/brainlayer/mcp/enrich_handler.pysrc/brainlayer/search_repo.pytests/test_enrich_defaults.pytests/test_enrichment_controller.pytests/test_hybrid_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: Cursor Bugbot
- GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
**/*.py: Usepaths.py:get_db_path()for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches
Files:
tests/test_enrichment_controller.pysrc/brainlayer/cli/__init__.pysrc/brainlayer/mcp/enrich_handler.pytests/test_enrich_defaults.pysrc/brainlayer/enrichment_controller.pytests/test_hybrid_search.pysrc/brainlayer/search_repo.pysrc/brainlayer/config.pysrc/brainlayer/mcp/__init__.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use retry logic onSQLITE_BUSYerrors; each worker must use its own database connection to handle concurrency safely
Classification must preserveai_code,stack_trace, anduser_messageverbatim; skipnoiseentries entirely and summarizebuild_loganddir_listingentries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback viaenrichment_controller.py, and Ollama as offline last-resort; allow override viaBRAINLAYER_ENRICH_BACKENDenv var
Configure enrichment rate viaBRAINLAYER_ENRICH_RATEenvironment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns:superseded_by,aggregated_into,archived_aton chunks table; exclude lifecycle-managed chunks from default search; allowinclude_archived=Trueto show history
Implementbrain_supersedewith safety gate for personal data (journals, notes, health/finance); use soft-delete forbrain_archivewith timestamp
Addsupersedesparameter tobrain_storefor atomic store-and-replace operations
Run linting and formatting with:ruff check src/ && ruff format src/
Run tests withpytest
UsePRAGMA wal_checkpoint(FULL)before and after bulk database operations to prevent WAL bloat
Files:
src/brainlayer/cli/__init__.pysrc/brainlayer/mcp/enrich_handler.pysrc/brainlayer/enrichment_controller.pysrc/brainlayer/search_repo.pysrc/brainlayer/config.pysrc/brainlayer/mcp/__init__.py
🧠 Learnings (13)
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Configure enrichment rate via `BRAINLAYER_ENRICH_RATE` environment variable (default 0.2 = 12 RPM)
Applied to files:
src/brainlayer/cli/__init__.pysrc/brainlayer/mcp/enrich_handler.pytests/test_enrich_defaults.pysrc/brainlayer/config.pysrc/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-02T23:32:14.543Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T23:32:14.543Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment rate configurable via `BRAINLAYER_ENRICH_RATE` environment variable (default 0.2 = 12 RPM)
Applied to files:
src/brainlayer/cli/__init__.pysrc/brainlayer/mcp/enrich_handler.pytests/test_enrich_defaults.pysrc/brainlayer/config.pysrc/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-03T11:34:19.303Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:34:19.303Z
Learning: Applies to src/brainlayer/cli.py : Use Typer CLI framework for command-line interface in `src/brainlayer/`
Applied to files:
src/brainlayer/cli/__init__.py
📚 Learning: 2026-03-22T15:55:22.017Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 100
File: src/brainlayer/enrichment_controller.py:175-199
Timestamp: 2026-03-22T15:55:22.017Z
Learning: In `src/brainlayer/enrichment_controller.py`, the `parallel` parameter in `enrich_local()` is intentionally kept in the function signature (currently unused, suppressed with `# noqa: ARG001`) for API stability. Parallel local enrichment via a thread pool or process pool is planned for a future iteration. Do not flag this as dead code requiring removal.
Applied to files:
src/brainlayer/mcp/enrich_handler.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable
Applied to files:
src/brainlayer/mcp/enrich_handler.pysrc/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Implement chunk lifecycle columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search; allow `include_archived=True` to show history
Applied to files:
src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-04T23:24:03.159Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T23:24:03.159Z
Learning: Applies to src/brainlayer/{vector_store,search}*.py : Chunk lifecycle: implement columns `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search
Applied to files:
src/brainlayer/enrichment_controller.pysrc/brainlayer/search_repo.py
📚 Learning: 2026-04-11T16:54:45.631Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.631Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`, `src/brainlayer/pipeline/write_queue.py`, and related enrichment pipeline files: A per-store single-writer queue is used for SQLite enrichment writes because SQLite allows only one writer at a time; direct concurrent writes caused lock contention under sustained Gemini Flex traffic. Do not flag serialized write patterns in this path as a performance concern — the queue is intentional.
Applied to files:
src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-03T11:43:08.915Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:43:08.915Z
Learning: Applies to src/brainlayer/*bulk*.py : Before bulk database operations: stop enrichment workers, checkpoint WAL with `PRAGMA wal_checkpoint(FULL)`, drop FTS triggers before bulk deletes
Applied to files:
src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)
Applied to files:
src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-11T16:54:45.631Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.631Z
Learning: Applies to `src/brainlayer/enrichment_controller.py` and `src/brainlayer/pipeline/rate_limiter.py`: Gemini API calls in the enrichment pipeline are gated by a token bucket rate limiter. The rate is controlled by `BRAINLAYER_ENRICH_RATE` (default `5/s`, burst `10`) to keep throughput inside the Gemini Flex intended envelope. This default supersedes the earlier 0.2 (12 RPM) default for the Gemini Flex integration path.
Applied to files:
src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via `enrichment_controller.py`, and Ollama as offline last-resort; allow override via `BRAINLAYER_ENRICH_BACKEND` env var
Applied to files:
src/brainlayer/mcp/__init__.py
📚 Learning: 2026-03-17T01:04:22.497Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.
Applied to files:
src/brainlayer/mcp/__init__.py
🔇 Additional comments (3)
src/brainlayer/search_repo.py (1)
177-197: Nice slot-preserving recombination for lexical-only hits.Rebuilding
top_candidatesin place keeps non-vector results anchored to their original score slots while still letting MMR diversify the vector-backed positions.tests/test_enrichment_controller.py (1)
381-397: Good regression coverage for the legacy-index migration.Using a real SQLite database here exercises the exact index-introspection and duplicate-backfill path that mocked cursors would miss.
Also applies to: 424-447
tests/test_hybrid_search.py (1)
349-410: Strong regression test for MMR slot preservation and dedupe behavior.This covers the lexical-only slot retention and duplicate-vector suppression path clearly, which is exactly the risky edge case from the rerank change.
| try: | ||
| return int(value) | ||
| except ValueError: | ||
| logger.warning("Invalid %s=%r; using default %s", name, raw, default) | ||
| return default |
There was a problem hiding this comment.
Reject non-positive integers here as well.
int() accepts 0 and negative values, so BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS=-1 becomes the shared default even though this setting is only valid for positive lookback windows. Treat <= 0 the same as malformed input and fall back to default.
Suggested fix
def get_int_env(name: str, default: int) -> int:
"""Read an integer env var, falling back cleanly on malformed values."""
raw = os.environ.get(name)
if raw is None:
return default
value = raw.strip()
if not value:
return default
try:
- return int(value)
+ parsed = int(value)
except ValueError:
logger.warning("Invalid %s=%r; using default %s", name, raw, default)
return default
+ if parsed <= 0:
+ logger.warning("Invalid %s=%r; using default %s", name, raw, default)
+ return default
+ return parsed🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/config.py` around lines 21 - 25, The current int parsing
returns int(value) even for 0 or negatives; update the parse so that after
converting value = int(value) you reject non-positive numbers by checking if
value <= 0, then call logger.warning("Invalid %s=%r; using default %s", name,
raw, default) and return default; otherwise return the parsed positive integer.
Locate the try/except block that uses name, raw, default and replace the bare
return int(value) with this validation (keep the same warning message and
exception handling).
| try: | ||
| indexes = list(cursor.execute("PRAGMA index_list(chunks)")) | ||
| has_content_hash_index = False | ||
| for row in indexes: | ||
| index_name = row[1] | ||
| is_unique = bool(row[2]) | ||
| quoted_name = index_name.replace('"', '""') | ||
| columns = [info[2] for info in cursor.execute(f'PRAGMA index_info("{quoted_name}")')] | ||
| if "content_hash" not in columns: | ||
| continue | ||
| if is_unique: | ||
| cursor.execute(f'DROP INDEX IF EXISTS "{quoted_name}"') | ||
| continue | ||
| has_content_hash_index = True | ||
|
|
||
| if not has_content_hash_index: | ||
| cursor.execute("CREATE INDEX IF NOT EXISTS idx_content_hash ON chunks(content_hash)") | ||
| except Exception: | ||
| pass | ||
|
|
||
| return True |
There was a problem hiding this comment.
Don't report schema repair success after failed DDL.
Lines 419-422 swallow every failure from the index scan/drop/create path and still return True. If DROP INDEX or CREATE INDEX hits SQLITE_BUSY, callers will cache this store as repaired and continue into writes that can still fail with the same UNIQUE constraint this helper is meant to fix. Retry busy DDL and surface failure instead of passing here.
As per coding guidelines, "Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior" and "Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/enrichment_controller.py` around lines 402 - 422, The current
try/except around the index scan/drop/create swallows all exceptions and returns
True even when DDL fails; update the logic in the function containing this block
(the code using cursor.execute, has_content_hash_index, DROP INDEX and CREATE
INDEX) to remove the bare except/pass, implement retry-on-SQLITE_BUSY for each
DDL operation (PRAGMA index_info, DROP INDEX, CREATE INDEX) with a small backoff
and limited attempts, and if retries exhaust or any non-BUSY error occurs,
propagate or return a failure (do not return True); ensure the final return
reflects actual success only after successful DDL or confirmed existing index.
| for row in indexes: | ||
| index_name = row[1] | ||
| is_unique = bool(row[2]) | ||
| quoted_name = index_name.replace('"', '""') | ||
| columns = [info[2] for info in cursor.execute(f'PRAGMA index_info("{quoted_name}")')] | ||
| if "content_hash" not in columns: | ||
| continue | ||
| if is_unique: | ||
| cursor.execute(f'DROP INDEX IF EXISTS "{quoted_name}"') | ||
| continue |
There was a problem hiding this comment.
Only drop the legacy single-column unique index.
This loop removes any unique index whose column list merely includes content_hash. That also drops composite constraints like (project, content_hash) or (conversation_id, content_hash), which are not the legacy bug this migration is targeting. Please restrict the repair to unique indexes whose indexed columns are exactly ["content_hash"] (and ideally the known legacy name) before issuing DROP INDEX.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/enrichment_controller.py` around lines 405 - 414, The
migration loop currently drops any unique index that contains "content_hash"
even if it's part of a composite index; change the logic in the loop that
iterates over indexes (using variables index_name, is_unique, quoted_name,
columns and cursor.execute/PRAGMA index_info) to only DROP INDEX when the index
is unique and its columns exactly equal ["content_hash"] (preferably also verify
the index_name matches the known legacy name if available) — otherwise skip it
so composite indexes like (project, content_hash) or (conversation_id,
content_hash) are preserved.
| monkeypatch.delenv("BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS", raising=False) | ||
| importlib.reload(config) | ||
| importlib.reload(cli) | ||
| importlib.reload(mcp) | ||
| importlib.reload(enrich_handler) |
There was a problem hiding this comment.
Restore the original env value before reloading modules in cleanup.
Cleanup currently forces the env var to “unset” before reloading modules. If it was set before this test, subsequent module constants can be reloaded with the wrong baseline and leak state across tests.
Suggested fix
+import os
import importlib
@@
def test_invalid_realtime_enrich_since_hours_env_falls_back(monkeypatch):
@@
- monkeypatch.setenv("BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS", "24h")
+ env_key = "BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS"
+ original = os.environ.get(env_key)
+ monkeypatch.setenv(env_key, "24h")
@@
finally:
- monkeypatch.delenv("BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS", raising=False)
+ if original is None:
+ monkeypatch.delenv(env_key, raising=False)
+ else:
+ monkeypatch.setenv(env_key, original)
importlib.reload(config)
importlib.reload(cli)
importlib.reload(mcp)
importlib.reload(enrich_handler)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_enrich_defaults.py` around lines 24 - 28, The test currently calls
monkeypatch.delenv("BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS", raising=False) then
importlib.reload(config), importlib.reload(cli), importlib.reload(mcp),
importlib.reload(enrich_handler), which can leak state if the env var was set
before the test; update the test cleanup in tests/test_enrich_defaults.py to
capture the original value (e.g., orig =
os.environ.get("BRAINLAYER_DEFAULT_ENRICH_SINCE_HOURS")) before deleting, then
restore that original value (use monkeypatch.setenv if orig is not None,
otherwise monkeypatch.delenv) and only then call importlib.reload(config),
importlib.reload(cli), importlib.reload(mcp), importlib.reload(enrich_handler)
so module constants reload with the original environment baseline.

Summary
chunks.content_hashand recreating the intended non-unique lookup index_ensure_content_hash_column()as the single place that repairs this schema drift before enrichment/backfill writes runVerification
pytest tests/test_enrichment_controller.pypytest tests/test_concurrent_enrichment.pyidx_content_hash_uniqueremoved,idx_content_hashpresent3, enriched3, skipped0, failed0Context
UNIQUE constraint failed: chunks.content_hashduringUPDATE chunks SET content_hash = ? WHERE id = ?com.brainlayer.watchis not currently loaded, and recent watch logs showdatabase is lockedplusNo space left on device; that is not changed in this PRNote
Medium Risk
Touches enrichment-time SQLite schema/index repair and search reranking logic, so failures could impact enrichment runs or result ordering; changes are localized and covered by targeted tests.
Overview
Fixes enrichment schema drift by enhancing
_ensure_content_hash_column()to detect and drop any unique index that includeschunks.content_hash, then ensure a non-unique lookup index exists so backfill/realtime enrichment can write duplicate hashes without aborting.Centralizes
DEFAULT_REALTIME_ENRICH_SINCE_HOURSenv parsing into newconfig.get_int_env()(with malformed-value fallback) and updates CLI/MCP entrypoints to import the shared default; adds regression tests for the env fallback and for legacy-DB unique-index scenarios.Adjusts MMR reranking in
SearchMixin._mmr_rerank_scored_results()to cap vector-based selection ton_resultsand to keep lexical-only (non-vector) hits in their original score slots; adds coverage for this ordering behavior.Reviewed by Cursor Bugbot for commit e37754d. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Drop legacy unique
content_hashindex and replace with non-unique index inchunkstable_ensure_content_hash_columnin enrichment_controller.py to detect and drop any unique index oncontent_hash, then creates a non-uniqueidx_content_hashindex, fixing failures caused by duplicate content hashes.DEFAULT_REALTIME_ENRICH_SINCE_HOURSparsing into a newget_int_envhelper in config.py, which falls back to the default and logs a warning on malformed input instead of raising at import time.SearchMixin._mmr_rerank_scored_resultsto preserve original position slots for non-vector (lexical-only) hits and cap selection atn_results.Macroscope summarized e37754d.
Summary by CodeRabbit