Skip to content

bigmacfive/turbo-graph

Repository files navigation

turbo-graph: vector + graph metadata retrieval

License English README Korean README Upstream turbovec Docs

turbo-graph

turbovec made embeddings small. turbo-graph makes constrained retrieval operational.

When your RAG query is no longer just top_k, but:

tenant ∩ graph ∩ tag ∩ source ∩ time ∩ BM25 candidates ∩ vector search

do not rebuild that view in Python on every request.

turbo-graph keeps the turbovec/TurboQuant core and adds:

  • graph memory
  • tag/source/time indexed views
  • cached SlotMask compilation
  • graph-aware rerank
  • explain/cache telemetry
  • Python GraphMemoryIndex

When should I use this?

Use turbovec when:

  • you mostly need flat global top-k
  • your allowlist is cheap to build
  • you want the smallest API

Use turbo-graph when:

  • most queries carry tenant/source/tag/time constraints
  • you expand graph neighborhoods before vector search
  • the same filtered views repeat across hot queries
  • you need explain reports and cache telemetry

Contents: How this relates to turbovec · Comparison · Benchmarks · Install · Quick start · Documentation


How this relates to turbovec

This repository is a fork of the turbovec codebase. TurboQuant encoding/search, .tv / .tvim, and the core Python index APIs are the same lineage. The new public surface is the graph-memory layer around that core.

turbovec vs turbo-graph stack

Orange block = graph layer in this fork. Shared core = turbovec TurboQuant lineage.

Full capability matrix
Capability turbovec turbo-graph
TurboQuant encode / search Yes Yes, same core
TurboQuantIndex / IdMapIndex Yes Yes, compatible API
Kernel allowlist / mask Yes, since v0.3 Yes, plus reusable SlotMask
Graph neighborhood expansion No Yes
Tag / source / time views Bring your own SQL Indexed + cached
Graph rerank + BM25 hybrid blend No Yes
Explain / cache telemetry Partial First-class reports
Python GraphMemoryIndex No Core operating API
Framework integrations Yes Yes

turbovec vs turbo-graph

What turbovec already solves

Upstream turbovec is not a naive "vector search, then filter in Python" design:

  • IdMapIndex.search(..., allowlist=ids) applies restrictions inside the SIMD kernel, skipping empty 32-vector blocks before LUT work (#30).
  • TurboQuantIndex.search(..., mask=...) does the same for slot masks.
  • Results come back as (nq, min(k, n_allowed)), so tight filters do not need padding or global over-fetch just to recover recall.
  • Train-free ingest, TQ+ calibration, RaBitQ scoring correction, and strong ARM performance vs FAISS FastScan are inherited here.

turbo-graph does not replace kernel filtering. It adds the part that turbovec leaves in application code: graph expansion, metadata indexes, candidate-list intersection, reusable view caches, rerank, and explainability.

Query path comparison

Orange boxes = assembly work you still do in app code with turbovec. The turbo-graph path compiles the constraint view once and reuses it.

Rule of thumb: turbovec is enough when filters are light. turbo-graph wins when constraints are the product and graph ∩ tag ∩ source ∩ time ∩ candidates is rebuilt across hot queries.

The Python bindings release the GIL around long Rust add/search/prepare/write paths, so threaded Python services can overlap independent vector and graph-memory requests instead of serializing on the interpreter lock.

Should you migrate?

Migration decision flow

Answer yes to three or more:

  1. Most queries carry tenant, source, tag, or time constraints.
  2. You expand graph neighborhoods before vector search.
  3. The same filter predicates repeat in bursts.
  4. You manually merge BM25/SQL scores with vector and graph scores.
  5. You need production explainability: trace, cache hit, selectivity.
  6. allowlist= is fine, but constructing the allowlist is the bottleneck.

Otherwise stay on turbovec for the flat core and use turbo-graph only for hot filtered routes.

from turbovec import IdMapIndex      # upstream
from turbo_graph import IdMapIndex   # this repo, compatible core API

Full matrix and PR checklist: docs/benchmark_turbo_graph_vs_turbo_vec.md.


Benchmarks

Numbers below come from benchmarks/results/*.json. Regenerate charts with python3 benchmarks/create_diagrams.py.

Setup (shared core): 100K database vectors, 1K queries, k=64, seed 42, unit-normalized embeddings.

Recall vs FAISS IndexPQ

Baseline: FAISS IndexPQ with LUT256 and training. Different from the speed baseline below.

R@1 delta summary

Recall curves d=1536

Recall curves d=3072

GloVe 2-bit is the one cell where FAISS edges ahead (-0.06pp). Both converge by k around 16. Raw data: benchmarks/results/.

Speed vs FAISS IndexPQFastScan

Median of 5 runs. Orange = TurboQuant faster; gray = FAISS faster or parity.

Speed win loss grid

ARM ST

ARM MT

x86 ST

x86 MT

ARM wins all 8 configs. x86 2-bit MT is the known gap vs FAISS AVX-512 VBMI.

All 16 speed numbers (ms/query)
Dim Bit Arch Thr TQ FAISS Gain
1536 2 ARM ST 1.083 1.235 +12.3%
1536 2 ARM MT 0.103 0.115 +10.4%
1536 2 x86 ST 1.271 1.172 -8.4%
1536 2 x86 MT 0.304 0.295 -3.1%
1536 4 ARM ST 1.992 2.450 +18.7%
1536 4 ARM MT 0.185 0.220 +15.9%
1536 4 x86 ST 2.439 2.560 +4.7%
1536 4 x86 MT 0.576 0.590 +2.4%
3072 2 ARM ST 2.124 2.439 +12.9%
3072 2 ARM MT 0.201 0.224 +10.3%
3072 2 x86 ST 2.657 2.582 -2.9%
3072 2 x86 MT 0.626 0.590 -6.1%
3072 4 ARM ST 3.968 4.925 +19.4%
3072 4 ARM MT 0.375 0.448 +16.3%
3072 4 x86 ST 5.342 5.474 +2.4%
3072 4 x86 MT 1.177 1.177 0.0%

Compression (100K vectors)

Compression vs FP32

10M x 1536d at 2-bit is about 4 GB of index RAM, vs about 31 GB for float32 vectors.

Graph layer

Selectivity latency

Low selectivity is already fast with kernel SlotMask. turbo-graph's target win is repeated compilation and reuse of graph ∩ metadata ∩ candidates views.

graph_view_bench now separates warm steady-state search from one-shot view compilation. On the synthetic 16,384 x 64 harness with --iters 3, the balanced constrained view selected 24 slots across 8 active SIMD blocks: cached mask search was about 0.020 ms/query, rebuilding the graph+metadata view was about 2.4x that cost, and global post-filtering needed fetch_k=8192 to recover full recall.

Shared limits: brute-force O(n) scan, not HNSW/IVF; 2-4 bit approximation; TQ+ needs at least 1000 vectors on the first add; pin versions for production services.


Install

pip install turbo-graph
cargo add turbo-graph

For local development:

cd turbo-graph-python
python3 -m maturin develop --release

Requirements: Rust 1.70+, dim % 8 == 0, bit_width in {2, 3, 4}. x86_64 targets AVX2 (x86-64-v3).


Quick start

Python - turbovec-compatible core

import numpy as np
from turbo_graph import IdMapIndex

idx = IdMapIndex(dim=1536, bit_width=4)
idx.add_with_ids(vectors.astype(np.float32), ids.astype(np.uint64))

allowed = np.array([1003, 1010, 1042], dtype=np.uint64)
scores, hit_ids = idx.search(query.astype(np.float32), k=10, allowlist=allowed)

Python - graph memory for constrained RAG

import numpy as np
from turbo_graph import GraphMemoryIndex

memory = GraphMemoryIndex(dim=1536, bit_width=4)
memory.add_records(
    embeddings.astype(np.float32),
    [
        {
            "id": 1001,
            "title": "Architecture note",
            "tags": ["architecture"],
            "source": "docs",
            "timestamp_ms": 1_700_000_000_000,
        },
        {
            "id": 1002,
            "title": "Retrieval cache note",
            "tags": ["architecture", "cache"],
            "source": "docs",
            "timestamp_ms": 1_700_000_010_000,
        },
    ],
)
memory.link_bidirectional(1001, 1002, 0.8)
memory.prepare()

hits = memory.search(
    query.astype(np.float32),
    k=10,
    seeds=[1001],
    required_tags=["architecture"],
    allowed_sources=["docs"],
    candidate_ids=[1001, 1002],  # optional BM25/SQL/ACL candidates
)

batch_hits = memory.search_batch(
    queries.astype(np.float32),
    k=10,
    seeds=[1001],
    required_tags=["architecture"],
    allowed_sources=["docs"],
    candidate_ids=[1001, 1002],
)

report = memory.explain(
    query.astype(np.float32),
    k=10,
    seeds=[1001],
    candidate_ids=[1001, 1002, 999],
)
print(report["plan"], report["telemetry"])

Runnable version: turbo-graph-python/examples/graph_memory_rag.py.

Rust - graph layer on the shared core

use turbo_graph::{GraphMemoryIndex, GraphSearchPreset, MemoryRecord, TurboQuantIndex};

let mut index = TurboQuantIndex::new(1536, 4)?;
index.add(&vectors);
index.prepare();

let mut memory = GraphMemoryIndex::new(1536, 4)?;
memory.add_records(
    &flat_vectors,
    vec![MemoryRecord::new(1001, "Architecture note", ["architecture"])
        .with_source("docs.example")
        .with_timestamp_ms(1_700_000_000_000)],
)?;

let report = memory.explain_graph_search_with_preset(
    &query,
    10,
    &[1001],
    GraphSearchPreset::balanced(),
    &["architecture"],
    &["docs.example"],
    Some(1_700_000_000_000),
    None,
);
println!(
    "hits={} cache_hit={}",
    report.hits.len(),
    report.plan.combined_cache_hit
);

Run benchmarks

# Shared turbovec-style ANN (needs ~/data/py-turboquant/)
python3 benchmarks/download_data.py all
python3 benchmarks/suite/recall_d1536_2bit.py
python3 benchmarks/suite/speed_d1536_2bit_arm_mt.py

# turbo-graph graph/view layer
cargo run -p turbo-graph --release --example graph_view_bench -- --iters 3 --csv /tmp/graph-view-bench.csv
cargo run -p turbo-graph --release --example graph_view_bench_summary -- /tmp/graph-view-bench.csv

The graph benchmark prints warmup-aware selectivity, mask-build, cold/warm view-compile, constrained-retrieval, and post-filter overfetch rows.

For release checks:

scripts/release_check.sh --quick
scripts/release_check.sh --full

The release script builds a fresh Python wheel, installs it into a temporary venv, runs Rust/Python gates, and does not publish, tag, or mutate git history.


Documentation

docs/
├── README.md ............... index
├── api.md .................. TurboQuantIndex · IdMapIndex · GraphMemoryIndex
├── graph_memory_layer.md ... views · presets · caches
├── benchmark_turbo_graph_vs_turbo_vec.md
└── integrations/ ........... LangChain · LlamaIndex · Haystack · Agno

Docs index · API · Graph layer · vs turbovec


Open Source

  • Contributing guide — issue/PR workflow, test gates, and benchmark expectations.
  • Changelog — public 0.1.0 release notes plus pre-0.1 development history.
  • Security policy — supported versions and vulnerability reporting.

References

Security

See SECURITY.md for supported versions and vulnerability reporting.

License

MIT - see LICENSE. Core algorithms follow the turbovec lineage; the graph layer is additional work in this fork.

About

TurboQuant-compatible vector search plus graph memory for constrained RAG.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors