Skip to content

feat(rag): improve RAG system — hybrid search, re-ranking, semantic chunking #25

@EngineerProjects

Description

@EngineerProjects

Summary

The current RAG system uses FTS5 (BM25) + vector cosine similarity in parallel,
but the results are not re-ranked and chunking is fixed-size. This leads to
irrelevant context being injected and relevant context being missed on long
documents.

Improvements

1. Semantic chunking

Replace fixed-size chunking with sentence/paragraph-aware splitting:

  • Split on sentence boundaries rather than arbitrary token counts
  • Respect markdown headers as natural chunk boundaries
  • Configurable overlap between chunks

2. Hybrid search scoring (RRF)

Combine BM25 and vector scores using Reciprocal Rank Fusion instead of
running them in parallel and taking a union:

rrf_score = 1/(k + bm25_rank) + 1/(k + vector_rank)

This produces a single ranked list with better precision.

3. Re-ranking (optional cross-encoder)

When a cross-encoder model is available locally (e.g. via Ollama),
use it to re-rank the top-N candidates before injecting into context.

4. Memory TTL and auto-pruning

Add a configurable TTL for memory entries — old, rarely-accessed records
are pruned automatically to keep the vector store lean.

5. Namespace isolation

Ensure RAG queries are always scoped to the current session namespace
to prevent cross-session memory leakage.

Acceptance criteria

  • Semantic chunker in internal/rag/chunker.go
  • RRF fusion in internal/rag/search.go replacing the current union approach
  • TTL field on vector_records table, pruning job runs on store open
  • Namespace isolation enforced at the query layer
  • Benchmarks: RRF recall@5 >= current implementation on the existing test fixtures
  • docs/ updated (RAG section in architecture.md or new docs/rag.md)

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreCore runtime / engine layerenhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions