Ask questions across all your entire document library. Run compliance audits. Get answers with source citations.
DocLens turns your document library into a searchable knowledge base. Upload PDFs, ask questions across all files, and run automated compliance checks. Works locally for privacy or in the cloud for speed.
- Built for teams dealing with contracts, policies, compliance docs, and internal knowledge bases.
|
Secure processing using local LLM. |
High-speed processing using Gemini. |
|
Select target documents and rule sets. |
Clear report showing violations and evidence. |
- Dual-Stage Retrieval: Combines Semantic Search (Dense Vectors) with Keyword Search (BM25) for maximum accuracy.
- Reciprocal Rank Fusion (RRF): Re-ranks results to ensure the most relevant context is prioritized.
- 🔒 Private Mode (Ollama): Runs entirely offline. Best for sensitive financial or legal documents.
- ☁️ Cloud Mode (Gemini): High-performance reasoning for general knowledge tasks.
- Rule-Based Checking: Compare any target document against a defined "Rule Set" (e.g., Company Policy).
- Structured Reporting: Generates readable cards with evidence citations.
- Centralized Library: Upload, store, and manage metadata for all business PDF documents.
- Categorization: Classify files as "General Knowledge" or "Rule Sets" to guide the retrieval logic.
- Library-Wide Search: Ask questions and get answers from across your entire document library, not just single files.
- Citations: Every answer includes links to the source text so you can verify the facts.
- Real-Time Streaming: Answers appear instantly as they are generated, rather than waiting for the whole block of text.
| Component | Technology | Role |
|---|---|---|
| Orchestration | LangChain | Chain logic, Document Loaders, Splitters |
| Vector DB | Qdrant (Docker) | Storing embeddings & metadata |
| Retrieval | BM25 + RRF | Hybrid Search Algorithms |
| Inference | Ollama / Gemini | Local & Cloud LLM providers |
-
Clone the repository:
git clone https://github.com/sdave0/doclens.git
-
Start the Vector Database:
docker-compose up -d
-
Install Dependencies:
pip install -r requirements.txt
-
Set Environment Variables: Create a
.envfile:GOOGLE_API_KEY=your_key_here QDRANT_URL=http://localhost:6333
-
Launch App:
streamlit run app.py




