Tiered content loading (L0/L1/L2) for search results

## Summary

Store three resolution levels for each chunk. Search returns the smallest useful level first. Load deeper levels on demand. Reduces token consumption by up to 90% on search results.

## Proposed layers

| Layer | Size | Purpose |
|-------|------|---------|
| L0 (Abstract) | ~100 tokens | One-sentence summary. Vector search and quick relevance check. |
| L1 (Overview) | ~1-2K tokens | Key points, enough for planning. "Use this first." |
| L2 (Detail) | Full chunk | Original text. Loaded only when confirmed needed. |

## How it would work

**Ingestion:** When `crystal.ingest()` stores a chunk (L2), also generate and store L0 and L1 via LLM summarization.

**Search:** `crystal_search` returns L0 by default. New flag: `--depth L1` or `--depth L2` for deeper results.

**Agent workflow:** Agent sees 5 L0 abstracts (500 tokens total). Picks the 2 relevant ones. Loads L1 for those (4K tokens). Digs into L2 for the one that matters (full chunk). Total: ~5K tokens instead of ~25K.

## What changes

- `memory-crystal-private/src/core.ts` ... add l0_text, l1_text columns. Generate on ingest.
- `memory-crystal-private/src/mcp-server.ts` ... add depth parameter to crystal_search tool
- `memory-crystal-private/src/search-pipeline.ts` ... return appropriate depth level
- `memory-crystal-private/src/llm.ts` ... add summarization prompts for L0/L1 generation

## Caveats (read before implementing)

- **Massive LLM cost on ingestion.** Every chunk now needs 2 LLM calls (generate L0 + L1) at ingest time. With 208K existing chunks, backfilling is expensive. For new chunks, it adds ~2 seconds per chunk. Is the token savings on search worth the ingestion cost?
- **Defer until the relay is real.** The token savings matter most on iOS/cloud where context is tight. On desktop CLI, returning full chunks is fine. This is a relay optimization, not a local one. Don't build it until #163 (relay) is in progress.
- **L0/L1 quality depends on LLM quality.** A bad summary is worse than no summary. If L0 says "discussion about architecture" and the agent skips it, we lost recall. Test L0 quality extensively before relying on it for filtering.
- **Simpler alternative: just truncate.** Instead of LLM-generated summaries, return the first N tokens of each chunk as the "preview." No LLM cost. Worse quality, but maybe good enough. Test this first.
- **Schema migration on 208K rows is non-trivial.** Adding columns is fine. Backfilling L0/L1 for existing chunks is a multi-hour LLM job. Consider: only generate L0/L1 for new chunks. Old chunks return L2 always.

## Inspiration

OpenViking's tiered context loading: L0 abstract (~100 tokens), L1 overview (~2K tokens), L2 full content. Bottom-up semantic generation. (Apache 2.0, Volcengine).

## Related

- #163 (Crystal relay ... this optimization matters most for relay/iOS)
- Search pipeline: `memory-crystal-private/src/search-pipeline.ts`
- Core: `memory-crystal-private/src/core.ts`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiered content loading (L0/L1/L2) for search results #61

Summary

Proposed layers

How it would work

What changes

Caveats (read before implementing)

Inspiration

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Layer	Size	Purpose
L0 (Abstract)	~100 tokens	One-sentence summary. Vector search and quick relevance check.
L1 (Overview)	~1-2K tokens	Key points, enough for planning. "Use this first."
L2 (Detail)	Full chunk	Original text. Loaded only when confirmed needed.

Tiered content loading (L0/L1/L2) for search results #61

Description

Summary

Proposed layers

How it would work

What changes

Caveats (read before implementing)

Inspiration

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions