Skip to content

Dream Weaver: dedup memories before storing #60

@lesaai

Description

@lesaai

Summary

Add a deduplication step to Dream Weaver's onMemoryExtracted hook. Before storing a new memory, search Crystal for similar existing memories. LLM decides: skip (duplicate), merge (update existing), or create (new).

Current problem

208K+ chunks with significant duplication. Every session that discusses the same architecture creates new chunks. Search results contain 3-5 versions of the same fact at different timestamps. Signal-to-noise ratio degrades as chunk count grows.

Proposed flow

Dream Weaver extracts memory
  -> crystal_search for similar (top 3, threshold 0.85+)
  -> if matches found:
       LLM decides: skip | merge | create
       skip: discard (exact duplicate)
       merge: update existing chunk text + bump timestamp
       create: store as new (related but distinct)
  -> if no matches:
       store as new

What changes

  • memory-crystal-private/src/dream-weaver.ts ... add dedup logic to onMemoryExtracted hook
  • memory-crystal-private/src/core.ts ... may need an update/merge method on Crystal class
  • memory-crystal-private/src/llm.ts ... add dedup prompt (compare candidate vs existing)

Caveats (read before implementing)

  • LLM calls are expensive. Every extracted memory now requires a search + LLM comparison. If Dream Weaver extracts 15 memories per run, that's 15 additional search+LLM round trips. Is the dedup quality worth the latency and cost? Benchmark first.
  • Merge is destructive. Updating an existing chunk changes the historical record. Consider: should merge create a new version and soft-delete the old one? Or append? Deleting history violates the "never delete" principle.
  • Threshold tuning is hard. 0.85 similarity sounds right but will need real-world testing. Too high = duplicates slip through. Too low = distinct memories get merged incorrectly. Start with skip-only (no merge) and add merge later.
  • Maybe batch dedup is better than inline. Instead of checking every memory at extraction time, run a periodic dedup pass over all memories (e.g. crystal dedup --dry-run). Less invasive. Easier to test. Easier to undo.
  • The real fix might be better extraction, not post-hoc dedup. If Dream Weaver's prompts produce fewer, higher-quality memories, dedup becomes unnecessary. Consider improving extraction quality (Dream Weaver: structured memory categories with merge rules #59) first.

Inspiration

OpenViking's session commit pipeline: vector pre-filtering finds similar existing memories, LLM deduplication decides skip/create/merge/delete (Apache 2.0, Volcengine).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions