Implement a database abstraction layer that will enable the use of different DB backends. by Davidyz · Pull Request #282 · Davidyz/VectorCode

Davidyz · 2025-09-01T12:14:30Z

Part of #221.

This will most likely be incompatible with the existing configuration, in the sense that we'd need to follow similar patterns for embedding functions and rerankers. As a temporary solution, we could maybe add a function that transforms the old config to the new one internally.

~~I'm not committed to this implementation, but I need some hands-on experience to know what we'd need from the abstraction layer. If this works out, we could just go with this.~~
Having spent some time looking into langchain implementations, I thought their approach is a bit bloated for our simple RAG tool that specialises in local files that are organised in directories (and makes extensive use of metadata). As such, I decided to follow this PR and implement my own database connector (mostly based on chromadb API design), which we can then use to implement supports for new databases.

codecov · 2025-09-19T09:30:25Z

Codecov Report

❌ Patch coverage is 99.80237% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.76%. Comparing base (171361e) to head (799e1fe).

Files with missing lines	Patch %	Lines
src/vectorcode/database/chroma.py	99.09%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #282      +/-   ##
==========================================
+ Coverage   99.72%   99.76%   +0.03%     
==========================================
  Files          25       32       +7     
  Lines        1845     2099     +254     
==========================================
+ Hits         1840     2094     +254     
  Misses          5        5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Davidyz · 2025-09-27T05:29:22Z

For the sake of easily configuring database settings for all projects, I'm planning to modify the config file resolution so that project configs will be merged with the global config. This means you can only configure the db/embedding/reranker only once in the global config.

…(1.x)

Davidyz · 2025-10-05T08:59:23Z

As a proof-of-concept, I'll try to get chromadb 1.x working as part of this PR. This is likely going to introduce packaging change. Specifically, the default chromadb version constraint will be <2.0.0, with an optional dep group that pin to ==0.6.3.

…ctor

… ChromaDB connector

superbiche · 2026-02-05T01:31:52Z

USearch Adapter Implementation - DBAL Validation & Benchmarks

I implemented a USearch + SQLite hybrid adapter using the DBAL interface from this PR to validate the abstraction layer design (with help from Claude). The implementation is available at superbiche/VectorCode@feat/usearch-adapter.

Benchmark Results

Large Codebase Benchmarks (vs ChromaDB)

Codebase	Files	Chunks	Unfiltered	10% Exclusion	50% Exclusion
Linux kernel	50k	250k	2.10x	258.6x	237.6x
VS Code	7.9k	39k	2.71x	69.5x	50.9x
Kubernetes	24k	120k	2.09x	153.6x	130.2x

The dramatic speedup for filtered queries comes from the different approach: ChromaDB filters during HNSW traversal (expensive with large exclusion sets), while USearch over-fetches and does simple Python set lookup.

DBAL Feedback

What works well:

Abstract base class design - clean separation, easy to extend
Config-driven initialization via `db_params`
Well-structured types (`QueryResult`, `CollectionInfo`, `VectoriseStats`)

Suggestions:

Index deletion - USearch doesn't support removing individual vectors. Consider adding optional rebuild_index() or documenting this limitation for backends that share it.
Score semantics - Document whether higher = better match (ChromaDB uses negative distances, USearch uses positive).
Collection metadata - Consider documenting minimal required fields (path, embedding_function, created_by).

Architecture

USearch only stores vectors + integer keys, so I paired it with SQLite for metadata:

~/.local/share/vectorcode/usearch/<collection_id>/
├── index.usearch      # Vector index (HNSW)
└── metadata.db        # SQLite: chunks, paths, content_hash

ChromaDB-Free Operation

To enable USearch to work completely independently of ChromaDB (avoiding version conflicts with Pydantic 2.x), I added:

Lazy imports (f87da45) - ChromaDB modules are only imported when actually using ChromaDB connectors
Standalone embedding functions (3f8fd3c) - Native implementations of OllamaEmbeddingFunction and SentenceTransformerEmbeddingFunction that don't require ChromaDB's embedding_functions module

This allows users to run USearch without ChromaDB installed at all, or with an incompatible ChromaDB version in their environment. The get_embedding_function() now:

First tries standalone implementations (Ollama, SentenceTransformer)
Falls back to ChromaDB's embedding_functions only if needed
Gracefully handles ImportError when ChromaDB is unavailable/incompatible

Happy to submit the adapter as a follow-up PR once this merges.

Davidyz added the breaking label Sep 1, 2025

Davidyz linked an issue Sep 1, 2025 that may be closed by this pull request

[FEAT]: Enable the use of multiple DB types #221

Open

Davidyz force-pushed the feat/db_layer branch 4 times, most recently from 54787ef to c3b83f8 Compare September 6, 2025 04:34

Davidyz force-pushed the feat/db_layer branch 4 times, most recently from 6f91093 to 7a432fc Compare September 16, 2025 09:43

Davidyz marked this pull request as ready for review September 16, 2025 10:05

Davidyz force-pushed the feat/db_layer branch 6 times, most recently from edd3382 to 21b820b Compare September 19, 2025 09:24

Davidyz force-pushed the feat/db_layer branch from 9714a10 to 0354806 Compare September 20, 2025 03:26

Davidyz added 11 commits September 22, 2025 11:41

WIP(cli): database abstraction layer

cdba77c

WIP(cli): Add wip chromadb connector

dac5f75

feat(db): Improve database abstraction and vectorisation process

8bda33f

feat(db): Implement delete and drop methods for database connectors

6f42c52

refactor(cli): drop now use the DB adapter layer.

9028981

fix(cli): default db_url for chroma0

416d7c8

fix(cli): minor fixes.

8cb0e2d

feat(cli): implement database builder with lazy import

1f457f7

refactor(cli): ls in CLI mode now use the DB adapter layer.

40b6d97

docs about database connectors.

129ebd4

feat(cli): support excluding files in queries.

fdc6e27

feat(cli): Report skipped files in vectorise stats

1df845e

Davidyz and others added 8 commits October 3, 2025 11:58

build(cli): add chroma0 dep group.

34421db

Auto generate docs

7baa033

chore(cli): Use chroma0 for CI in test workflow

11aecc4

chore(cli): Document how to install extra dependencies

4f12fd5

docs(cli): reflect packaging changes.

dc44ba9

Auto generate docs

d064d99

tests(db): Add database connector initialization test

dad1eb9

tests(db): skip tests when dependency's not met

78495a9

Davidyz force-pushed the feat/db_layer branch from 82ac555 to 78495a9 Compare October 4, 2025 03:26

Davidyz added 4 commits October 4, 2025 11:58

chore: extra coverage args via env var.

1684d14

feat(db): Check ChromaDB version on startup

5c13bea

refactor(cli): remove obsolete opts

ff1c3da

fix(cli): Fixes db_url config retrieval

d84fa81

Davidyz force-pushed the feat/db_layer branch from fbebb62 to d84fa81 Compare October 5, 2025 08:38

Davidyz added 2 commits October 5, 2025 16:42

refactor(chroma0): extract some stuff that can be reused by chromadb …

db10d56

…(1.x)

fix(db): Upgrade chromadb version check and include enums

6a20a0e

feat(chroma): a WIP chromadb connector for chroma 1.x

5ea7d74

Davidyz force-pushed the feat/db_layer branch from 6d63c7f to 5ea7d74 Compare October 5, 2025 09:36

Davidyz added 7 commits October 6, 2025 10:45

feat(chroma0): Raise CollectionNotFoundError on missing collection

19db7dd

feat(db): Implemented all methods in the ChromaDB 1.x connector

f3dfdae

coverage(cli): Remove static analysis and improve coverage pipeline

be217d6

feat(db): Add inter-process and inter-thread locks for ChromaDB conne…

29cb6aa

…ctor

feat(cli): Refactor lock manager and implement inter-thread locks for…

9021967

… ChromaDB connector

ci(cli): Run coverage via shell script and enable coredumpy

c59136b

Merge branch 'main' into feat/db_layer

799e1fe

Davidyz mentioned this pull request Dec 26, 2025

[BUG] Incompatible with Python >= 3.14 #305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement a database abstraction layer that will enable the use of different DB backends.#282

Implement a database abstraction layer that will enable the use of different DB backends.#282
Davidyz wants to merge 96 commits intomainfrom
feat/db_layer

Davidyz commented Sep 1, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

Davidyz commented Sep 27, 2025

Uh oh!

Davidyz commented Oct 5, 2025

Uh oh!

superbiche commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Davidyz commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Davidyz commented Sep 27, 2025

Uh oh!

Davidyz commented Oct 5, 2025

Uh oh!

superbiche commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Large Codebase Benchmarks (vs ChromaDB)

DBAL Feedback

Architecture

ChromaDB-Free Operation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Davidyz commented Sep 1, 2025 •

edited

Loading

codecov bot commented Sep 19, 2025 •

edited

Loading

superbiche commented Feb 5, 2026 •

edited

Loading