Summary
Store three resolution levels for each chunk. Search returns the smallest useful level first. Load deeper levels on demand. Reduces token consumption by up to 90% on search results.
Proposed layers
| Layer |
Size |
Purpose |
| L0 (Abstract) |
~100 tokens |
One-sentence summary. Vector search and quick relevance check. |
| L1 (Overview) |
~1-2K tokens |
Key points, enough for planning. "Use this first." |
| L2 (Detail) |
Full chunk |
Original text. Loaded only when confirmed needed. |
How it would work
Ingestion: When crystal.ingest() stores a chunk (L2), also generate and store L0 and L1 via LLM summarization.
Search: crystal_search returns L0 by default. New flag: --depth L1 or --depth L2 for deeper results.
Agent workflow: Agent sees 5 L0 abstracts (500 tokens total). Picks the 2 relevant ones. Loads L1 for those (4K tokens). Digs into L2 for the one that matters (full chunk). Total: ~5K tokens instead of ~25K.
What changes
memory-crystal-private/src/core.ts ... add l0_text, l1_text columns. Generate on ingest.
memory-crystal-private/src/mcp-server.ts ... add depth parameter to crystal_search tool
memory-crystal-private/src/search-pipeline.ts ... return appropriate depth level
memory-crystal-private/src/llm.ts ... add summarization prompts for L0/L1 generation
Caveats (read before implementing)
- Massive LLM cost on ingestion. Every chunk now needs 2 LLM calls (generate L0 + L1) at ingest time. With 208K existing chunks, backfilling is expensive. For new chunks, it adds ~2 seconds per chunk. Is the token savings on search worth the ingestion cost?
- Defer until the relay is real. The token savings matter most on iOS/cloud where context is tight. On desktop CLI, returning full chunks is fine. This is a relay optimization, not a local one. Don't build it until #163 (relay) is in progress.
- L0/L1 quality depends on LLM quality. A bad summary is worse than no summary. If L0 says "discussion about architecture" and the agent skips it, we lost recall. Test L0 quality extensively before relying on it for filtering.
- Simpler alternative: just truncate. Instead of LLM-generated summaries, return the first N tokens of each chunk as the "preview." No LLM cost. Worse quality, but maybe good enough. Test this first.
- Schema migration on 208K rows is non-trivial. Adding columns is fine. Backfilling L0/L1 for existing chunks is a multi-hour LLM job. Consider: only generate L0/L1 for new chunks. Old chunks return L2 always.
Inspiration
OpenViking's tiered context loading: L0 abstract (~100 tokens), L1 overview (~2K tokens), L2 full content. Bottom-up semantic generation. (Apache 2.0, Volcengine).
Related
- #163 (Crystal relay ... this optimization matters most for relay/iOS)
- Search pipeline:
memory-crystal-private/src/search-pipeline.ts
- Core:
memory-crystal-private/src/core.ts
Summary
Store three resolution levels for each chunk. Search returns the smallest useful level first. Load deeper levels on demand. Reduces token consumption by up to 90% on search results.
Proposed layers
How it would work
Ingestion: When
crystal.ingest()stores a chunk (L2), also generate and store L0 and L1 via LLM summarization.Search:
crystal_searchreturns L0 by default. New flag:--depth L1or--depth L2for deeper results.Agent workflow: Agent sees 5 L0 abstracts (500 tokens total). Picks the 2 relevant ones. Loads L1 for those (4K tokens). Digs into L2 for the one that matters (full chunk). Total: ~5K tokens instead of ~25K.
What changes
memory-crystal-private/src/core.ts... add l0_text, l1_text columns. Generate on ingest.memory-crystal-private/src/mcp-server.ts... add depth parameter to crystal_search toolmemory-crystal-private/src/search-pipeline.ts... return appropriate depth levelmemory-crystal-private/src/llm.ts... add summarization prompts for L0/L1 generationCaveats (read before implementing)
Inspiration
OpenViking's tiered context loading: L0 abstract (~100 tokens), L1 overview (~2K tokens), L2 full content. Bottom-up semantic generation. (Apache 2.0, Volcengine).
Related
memory-crystal-private/src/search-pipeline.tsmemory-crystal-private/src/core.ts