Skip to content

sdave0/DocLens

Repository files navigation

DocLens 🔍

Ask questions across all your entire document library. Run compliance audits. Get answers with source citations.


Overview

DocLens turns your document library into a searchable knowledge base. Upload PDFs, ask questions across all files, and run automated compliance checks. Works locally for privacy or in the cloud for speed.

  • Built for teams dealing with contracts, policies, compliance docs, and internal knowledge bases.

📸 Interface

🔒 Private Mode (Local)

Private Mode Dashboard

Secure processing using local LLM.

☁️ Cloud Mode (Performance)

Cloud Mode Dashboard

High-speed processing using Gemini.

Audit Configuration

Audit Setup

Select target documents and rule sets.

Audit Report

Audit Results

Clear report showing violations and evidence.

🔻 View Document Library
Document Library

Manage your uploaded files and rule sets.


Key Features

🧩 Advanced Hybrid Search

  • Dual-Stage Retrieval: Combines Semantic Search (Dense Vectors) with Keyword Search (BM25) for maximum accuracy.
  • Reciprocal Rank Fusion (RRF): Re-ranks results to ensure the most relevant context is prioritized.

🛡️ Switch Between Local & Cloud AI

  • 🔒 Private Mode (Ollama): Runs entirely offline. Best for sensitive financial or legal documents.
  • ☁️ Cloud Mode (Gemini): High-performance reasoning for general knowledge tasks.

🧾 Automated Compliance Auditor

  • Rule-Based Checking: Compare any target document against a defined "Rule Set" (e.g., Company Policy).
  • Structured Reporting: Generates readable cards with evidence citations.

🧠 Shared Knowledge Base

  • Centralized Library: Upload, store, and manage metadata for all business PDF documents.
  • Categorization: Classify files as "General Knowledge" or "Rule Sets" to guide the retrieval logic.

💬 Cross-Document Q&A

  • Library-Wide Search: Ask questions and get answers from across your entire document library, not just single files.
  • Citations: Every answer includes links to the source text so you can verify the facts.
  • Real-Time Streaming: Answers appear instantly as they are generated, rather than waiting for the whole block of text.

Technical Stack

Component Technology Role
Orchestration LangChain Chain logic, Document Loaders, Splitters
Vector DB Qdrant (Docker) Storing embeddings & metadata
Retrieval BM25 + RRF Hybrid Search Algorithms
Inference Ollama / Gemini Local & Cloud LLM providers

How to Run

  1. Clone the repository:

    git clone https://github.com/sdave0/doclens.git
  2. Start the Vector Database:

    docker-compose up -d
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Set Environment Variables: Create a .env file:

    GOOGLE_API_KEY=your_key_here
    QDRANT_URL=http://localhost:6333
  5. Launch App:

    streamlit run app.py

About

Turn scattered documents into instant answers and automated audits.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors