Skip to content

Virtual hierarchy in search (directory-style navigation) #62

@lesaai

Description

@lesaai

Summary

Add structural navigation to Crystal search. Instead of flat vector search across all 208K+ chunks, navigate a virtual hierarchy first (agent → source_type → date range), then search within the narrowed scope.

Proposed hierarchy

crystal://
  cc-mini/                    <- agent
    conversation/             <- source_type
      2026-03/                <- date range
    journal/
    memory/
  oc-lesa-mini/
    conversation/
    memory/

Search algorithm:

  1. Identify which "directory" matches the query intent
  2. Vector search within that scope (fewer candidates, better precision)
  3. Score propagation from parent directory relevance
  4. Recursive drill-down if top results aren't converging

What changes

  • memory-crystal-private/src/search-pipeline.ts ... add hierarchical pre-filtering step
  • memory-crystal-private/src/core.ts ... add index on agent_id + source_type + created_at for fast directory listing

Caveats (read before implementing)

  • Our deep search pipeline already works well. Query expansion + RRF fusion + LLM re-ranking + recency weighting already produces good results. Adding hierarchy on top adds complexity with unclear benefit. Don't fix what isn't broken.
  • The "older chunks drowning out recent ones" problem has a simpler fix. Tuning recency weighting parameters (half-life, decay curve) is cheaper and more direct than adding a directory abstraction.
  • Crystal is fundamentally flat. Chunks have metadata (agent_id, source_type, created_at) but no parent-child relationships. Bolting on a hierarchy means either (a) materializing virtual directories as new DB rows, or (b) computing the hierarchy at query time. Both add complexity.
  • Skip this unless search quality degrades. At 208K chunks, search is fine. At 1M+ chunks, hierarchy might help. But we're not there yet. File this as future-proofing and revisit when chunk count is a real problem.
  • OpenViking's hierarchy is native (filesystem paradigm). Ours would be synthetic. Their viking:// URIs are the primary data model. Our chunks are flat with metadata tags. The hierarchy would be a view layer, not a storage layer. That's inherently less powerful.

Inspiration

OpenViking's directory recursive retrieval: vector search finds candidate directories, recursive drill-down with score propagation (0.5 * embedding + 0.5 * parent_score), convergence detection. (Apache 2.0, Volcengine).

Related

  • Deep search pipeline: memory-crystal-private/src/search-pipeline.ts
  • Core search: memory-crystal-private/src/core.ts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions