turbo-graph

turbovec made embeddings small. turbo-graph makes constrained retrieval operational.

When your RAG query is no longer just top_k, but:

tenant ∩ graph ∩ tag ∩ source ∩ time ∩ BM25 candidates ∩ vector search

do not rebuild that view in Python on every request.

turbo-graph keeps the turbovec/TurboQuant core and adds:

graph memory
tag/source/time indexed views
cached SlotMask compilation
graph-aware rerank
explain/cache telemetry
Python GraphMemoryIndex

When should I use this?

Use turbovec when:

you mostly need flat global top-k
your allowlist is cheap to build
you want the smallest API

Use turbo-graph when:

most queries carry tenant/source/tag/time constraints
you expand graph neighborhoods before vector search
the same filtered views repeat across hot queries
you need explain reports and cache telemetry

Contents: How this relates to turbovec · Comparison · Benchmarks · Install · Quick start · Documentation

How this relates to turbovec

This repository is a fork of the turbovec codebase. TurboQuant encoding/search, .tv / .tvim, and the core Python index APIs are the same lineage. The new public surface is the graph-memory layer around that core.

Orange block = graph layer in this fork. Shared core = turbovec TurboQuant lineage.

Full capability matrix

Capability	turbovec	turbo-graph
TurboQuant encode / search	Yes	Yes, same core
`TurboQuantIndex` / `IdMapIndex`	Yes	Yes, compatible API
Kernel `allowlist` / `mask`	Yes, since v0.3	Yes, plus reusable `SlotMask`
Graph neighborhood expansion	No	Yes
Tag / source / time views	Bring your own SQL	Indexed + cached
Graph rerank + BM25 hybrid blend	No	Yes
Explain / cache telemetry	Partial	First-class reports
Python `GraphMemoryIndex`	No	Core operating API
Framework integrations	Yes	Yes

turbovec vs turbo-graph

What turbovec already solves

Upstream turbovec is not a naive "vector search, then filter in Python" design:

IdMapIndex.search(..., allowlist=ids) applies restrictions inside the SIMD kernel, skipping empty 32-vector blocks before LUT work (#30).
TurboQuantIndex.search(..., mask=...) does the same for slot masks.
Results come back as (nq, min(k, n_allowed)), so tight filters do not need padding or global over-fetch just to recover recall.
Train-free ingest, TQ+ calibration, RaBitQ scoring correction, and strong ARM performance vs FAISS FastScan are inherited here.

turbo-graph does not replace kernel filtering. It adds the part that turbovec leaves in application code: graph expansion, metadata indexes, candidate-list intersection, reusable view caches, rerank, and explainability.

Orange boxes = assembly work you still do in app code with turbovec. The turbo-graph path compiles the constraint view once and reuses it.

Rule of thumb: turbovec is enough when filters are light. turbo-graph wins when constraints are the product and graph ∩ tag ∩ source ∩ time ∩ candidates is rebuilt across hot queries.

The Python bindings release the GIL around long Rust add/search/prepare/write paths, so threaded Python services can overlap independent vector and graph-memory requests instead of serializing on the interpreter lock.

Should you migrate?

Answer yes to three or more:

Most queries carry tenant, source, tag, or time constraints.
You expand graph neighborhoods before vector search.
The same filter predicates repeat in bursts.
You manually merge BM25/SQL scores with vector and graph scores.
You need production explainability: trace, cache hit, selectivity.
allowlist= is fine, but constructing the allowlist is the bottleneck.

Otherwise stay on turbovec for the flat core and use turbo-graph only for hot filtered routes.

from turbovec import IdMapIndex      # upstream
from turbo_graph import IdMapIndex   # this repo, compatible core API

Full matrix and PR checklist: docs/benchmark_turbo_graph_vs_turbo_vec.md.

Benchmarks

Numbers below come from benchmarks/results/*.json. Regenerate charts with python3 benchmarks/create_diagrams.py.

Setup (shared core): 100K database vectors, 1K queries, k=64, seed 42, unit-normalized embeddings.

Recall vs FAISS IndexPQ

Baseline: FAISS IndexPQ with LUT256 and training. Different from the speed baseline below.

GloVe 2-bit is the one cell where FAISS edges ahead (-0.06pp). Both converge by k around 16. Raw data: benchmarks/results/.

Speed vs FAISS IndexPQFastScan

Median of 5 runs. Orange = TurboQuant faster; gray = FAISS faster or parity.

ARM wins all 8 configs. x86 2-bit MT is the known gap vs FAISS AVX-512 VBMI.

All 16 speed numbers (ms/query)

Dim	Bit	Arch	Thr	TQ	FAISS	Gain
1536	2	ARM	ST	1.083	1.235	+12.3%
1536	2	ARM	MT	0.103	0.115	+10.4%
1536	2	x86	ST	1.271	1.172	-8.4%
1536	2	x86	MT	0.304	0.295	-3.1%
1536	4	ARM	ST	1.992	2.450	+18.7%
1536	4	ARM	MT	0.185	0.220	+15.9%
1536	4	x86	ST	2.439	2.560	+4.7%
1536	4	x86	MT	0.576	0.590	+2.4%
3072	2	ARM	ST	2.124	2.439	+12.9%
3072	2	ARM	MT	0.201	0.224	+10.3%
3072	2	x86	ST	2.657	2.582	-2.9%
3072	2	x86	MT	0.626	0.590	-6.1%
3072	4	ARM	ST	3.968	4.925	+19.4%
3072	4	ARM	MT	0.375	0.448	+16.3%
3072	4	x86	ST	5.342	5.474	+2.4%
3072	4	x86	MT	1.177	1.177	0.0%

Compression (100K vectors)

10M x 1536d at 2-bit is about 4 GB of index RAM, vs about 31 GB for float32 vectors.

Graph layer

Low selectivity is already fast with kernel SlotMask. turbo-graph's target win is repeated compilation and reuse of graph ∩ metadata ∩ candidates views.

graph_view_bench now separates warm steady-state search from one-shot view compilation. On the synthetic 16,384 x 64 harness with --iters 3, the balanced constrained view selected 24 slots across 8 active SIMD blocks: cached mask search was about 0.020 ms/query, rebuilding the graph+metadata view was about 2.4x that cost, and global post-filtering needed fetch_k=8192 to recover full recall.

Shared limits: brute-force O(n) scan, not HNSW/IVF; 2-4 bit approximation; TQ+ needs at least 1000 vectors on the first add; pin versions for production services.

Install

pip install turbo-graph
cargo add turbo-graph

For local development:

cd turbo-graph-python
python3 -m maturin develop --release

Requirements: Rust 1.70+, dim % 8 == 0, bit_width in {2, 3, 4}. x86_64 targets AVX2 (x86-64-v3).

Quick start

Python - turbovec-compatible core

import numpy as np
from turbo_graph import IdMapIndex

idx = IdMapIndex(dim=1536, bit_width=4)
idx.add_with_ids(vectors.astype(np.float32), ids.astype(np.uint64))

allowed = np.array([1003, 1010, 1042], dtype=np.uint64)
scores, hit_ids = idx.search(query.astype(np.float32), k=10, allowlist=allowed)

Python - graph memory for constrained RAG

import numpy as np
from turbo_graph import GraphMemoryIndex

memory = GraphMemoryIndex(dim=1536, bit_width=4)
memory.add_records(
    embeddings.astype(np.float32),
    [
        {
            "id": 1001,
            "title": "Architecture note",
            "tags": ["architecture"],
            "source": "docs",
            "timestamp_ms": 1_700_000_000_000,
        },
        {
            "id": 1002,
            "title": "Retrieval cache note",
            "tags": ["architecture", "cache"],
            "source": "docs",
            "timestamp_ms": 1_700_000_010_000,
        },
    ],
)
memory.link_bidirectional(1001, 1002, 0.8)
memory.prepare()

hits = memory.search(
    query.astype(np.float32),
    k=10,
    seeds=[1001],
    required_tags=["architecture"],
    allowed_sources=["docs"],
    candidate_ids=[1001, 1002],  # optional BM25/SQL/ACL candidates
)

batch_hits = memory.search_batch(
    queries.astype(np.float32),
    k=10,
    seeds=[1001],
    required_tags=["architecture"],
    allowed_sources=["docs"],
    candidate_ids=[1001, 1002],
)

report = memory.explain(
    query.astype(np.float32),
    k=10,
    seeds=[1001],
    candidate_ids=[1001, 1002, 999],
)
print(report["plan"], report["telemetry"])

Runnable version: turbo-graph-python/examples/graph_memory_rag.py.

Rust - graph layer on the shared core

use turbo_graph::{GraphMemoryIndex, GraphSearchPreset, MemoryRecord, TurboQuantIndex};

let mut index = TurboQuantIndex::new(1536, 4)?;
index.add(&vectors);
index.prepare();

let mut memory = GraphMemoryIndex::new(1536, 4)?;
memory.add_records(
    &flat_vectors,
    vec![MemoryRecord::new(1001, "Architecture note", ["architecture"])
        .with_source("docs.example")
        .with_timestamp_ms(1_700_000_000_000)],
)?;

let report = memory.explain_graph_search_with_preset(
    &query,
    10,
    &[1001],
    GraphSearchPreset::balanced(),
    &["architecture"],
    &["docs.example"],
    Some(1_700_000_000_000),
    None,
);
println!(
    "hits={} cache_hit={}",
    report.hits.len(),
    report.plan.combined_cache_hit
);

Run benchmarks

# Shared turbovec-style ANN (needs ~/data/py-turboquant/)
python3 benchmarks/download_data.py all
python3 benchmarks/suite/recall_d1536_2bit.py
python3 benchmarks/suite/speed_d1536_2bit_arm_mt.py

# turbo-graph graph/view layer
cargo run -p turbo-graph --release --example graph_view_bench -- --iters 3 --csv /tmp/graph-view-bench.csv
cargo run -p turbo-graph --release --example graph_view_bench_summary -- /tmp/graph-view-bench.csv

The graph benchmark prints warmup-aware selectivity, mask-build, cold/warm view-compile, constrained-retrieval, and post-filter overfetch rows.

For release checks:

scripts/release_check.sh --quick
scripts/release_check.sh --full

The release script builds a fresh Python wheel, installs it into a temporary venv, runs Rust/Python gates, and does not publish, tag, or mutate git history.

Documentation

docs/
├── README.md ............... index
├── api.md .................. TurboQuantIndex · IdMapIndex · GraphMemoryIndex
├── graph_memory_layer.md ... views · presets · caches
├── benchmark_turbo_graph_vs_turbo_vec.md
└── integrations/ ........... LangChain · LlamaIndex · Haystack · Agno

→ Docs index · API · Graph layer · vs turbovec

Open Source

Contributing guide — issue/PR workflow, test gates, and benchmark expectations.
Changelog — public 0.1.0 release notes plus pre-0.1 development history.
Security policy — supported versions and vulnerability reporting.

References

Security

See SECURITY.md for supported versions and vulnerability reporting.

License

MIT - see LICENSE. Core algorithms follow the turbovec lineage; the graph layer is additional work in this fork.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.cargo		.cargo
.github		.github
benchmarks		benchmarks
docs		docs
examples/downstream-smoke		examples/downstream-smoke
scripts		scripts
turbo-graph-python		turbo-graph-python
turbo-graph		turbo-graph
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.ko.md		README.ko.md
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

turbo-graph

When should I use this?

How this relates to turbovec

turbovec vs turbo-graph

What turbovec already solves

Should you migrate?

Benchmarks

Recall vs FAISS IndexPQ

Speed vs FAISS IndexPQFastScan

Compression (100K vectors)

Graph layer

Install

Quick start

Python - turbovec-compatible core

Python - graph memory for constrained RAG

Rust - graph layer on the shared core

Run benchmarks

Documentation

Open Source

References

Security

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

turbo-graph

When should I use this?

How this relates to turbovec

turbovec vs turbo-graph

What turbovec already solves

Should you migrate?

Benchmarks

Recall vs FAISS IndexPQ

Speed vs FAISS IndexPQFastScan

Compression (100K vectors)

Graph layer

Install

Quick start

Python - turbovec-compatible core

Python - graph memory for constrained RAG

Rust - graph layer on the shared core

Run benchmarks

Documentation

Open Source

References

Security

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages