Implement a database abstraction layer that will enable the use of different DB backends.#282
Implement a database abstraction layer that will enable the use of different DB backends.#282
Conversation
54787ef to
c3b83f8
Compare
6f91093 to
7a432fc
Compare
edd3382 to
21b820b
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #282 +/- ##
==========================================
+ Coverage 99.72% 99.76% +0.03%
==========================================
Files 25 32 +7
Lines 1845 2099 +254
==========================================
+ Hits 1840 2094 +254
Misses 5 5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9714a10 to
0354806
Compare
|
For the sake of easily configuring database settings for all projects, I'm planning to modify the config file resolution so that project configs will be merged with the global config. This means you can only configure the db/embedding/reranker only once in the global config. |
|
As a proof-of-concept, I'll try to get chromadb 1.x working as part of this PR. This is likely going to introduce packaging change. Specifically, the default chromadb version constraint will be |
… ChromaDB connector
|
USearch Adapter Implementation - DBAL Validation & Benchmarks I implemented a USearch + SQLite hybrid adapter using the DBAL interface from this PR to validate the abstraction layer design (with help from Claude). The implementation is available at superbiche/VectorCode@feat/usearch-adapter. Benchmark ResultsLarge Codebase Benchmarks (vs ChromaDB)
The dramatic speedup for filtered queries comes from the different approach: ChromaDB filters during HNSW traversal (expensive with large exclusion sets), while USearch over-fetches and does simple Python set lookup. DBAL FeedbackWhat works well:
Suggestions:
ArchitectureUSearch only stores vectors + integer keys, so I paired it with SQLite for metadata: ChromaDB-Free OperationTo enable USearch to work completely independently of ChromaDB (avoiding version conflicts with Pydantic 2.x), I added:
This allows users to run USearch without ChromaDB installed at all, or with an incompatible ChromaDB version in their environment. The
Happy to submit the adapter as a follow-up PR once this merges. |
Part of #221.
This will most likely be incompatible with the existing configuration, in the sense that we'd need to follow similar patterns for embedding functions and rerankers. As a temporary solution, we could maybe add a function that transforms the old config to the new one internally.
I'm not committed to this implementation, but I need some hands-on experience to know what we'd need from the abstraction layer. If this works out, we could just go with this.Having spent some time looking into langchain implementations, I thought their approach is a bit bloated for our simple RAG tool that specialises in local files that are organised in directories (and makes extensive use of metadata). As such, I decided to follow this PR and implement my own database connector (mostly based on chromadb API design), which we can then use to implement supports for new databases.