Skip to content

Suraj-Unde/Domain-Specific-AI-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Domain-Specific AI Assistant

Overview

This repository demonstrates a domain-specific AI assistant built with LangChain, vector search, and a retrieval-augmented generation (RAG) pipeline.

The goal is to create an AI system that:

  • understands industry-specific documents,
  • retrieves the right knowledge from a private data store,
  • generates grounded, accurate answers,
  • and supports safe, auditable workflows.

Why Domain-Specific AI?

General-purpose AI often falls short for specialized use cases because it lacks the insider context and rules of a specific domain.

Domain-specific assistants are built from real business data and designed around the language, workflows, and compliance requirements of a target industry.

Key benefits:

  • better accuracy and relevance,
  • lower hallucination risk,
  • improved trust and auditability,
  • stronger support for regulated environments.

Tech Stack

Component Technology Purpose
Language Python 3.11+ Core development language
LLM Framework LangChain Orchestration and RAG pipeline
Language Model OpenAI GPT Generative responses and embeddings
Vector Database Chroma Semantic search and embeddings storage
Embeddings OpenAI Embeddings Document and query vectorization
Text Processing LangChain Text Splitters Document chunking and preprocessing
Web Framework FastAPI REST API deployment
Server Uvicorn ASGI server for FastAPI
Memory Management LangChain Memory Multi-turn conversation context
Document Loading LangChain Loaders PDF, DOCX, and text ingestion
Tool Integration LangChain Tools & Agents Safe function execution

Key Libraries

  • langchain - LLM orchestration and RAG patterns
  • langchain-openai - OpenAI integration
  • langchain-chroma - Chroma vector store integration
  • fastapi - REST API framework
  • uvicorn - ASGI web server
  • openai - OpenAI API client
  • chromadb - Vector database

Core Pipeline

The main pattern in this repository is:

ingest → chunk → embed → index → retrieve → generate → act

This pipeline ensures the assistant is grounded in real data and able to answer domain-specific questions reliably.

What This Project Shows

This repo includes examples for:

  • document ingestion and preprocessing,
  • text chunking and metadata enrichment,
  • building embeddings and a Chroma vector store,
  • retrieval-based QA with LangChain,
  • memory-enabled conversational behavior,
  • safe tool integration,
  • basic FastAPI deployment.

Repository Files

  • scripts/ — executable workflows and demos.
  • utils/ — helper modules for ingestion, chunking, and retrieval.
  • common/ — reference documents and shared notes.
  • requirement.txt — Python dependencies.
  • data/ — sample domain documents.
  • db/ — persisted Chroma database storage.

Use scripts/ for runnable workflows and demos, and utils/ for reusable ingestion, chunking, and retrieval helpers.

Project Structure

Domain-Specific-AI-Assistant/
├── scripts/
│   ├── app.py
│   ├── generate_embeddings.py
│   ├── ingest_and_build_index.py
│   ├── qa_chain_demo.py
│   ├── memory_multiturn_demo.py
│   └── tools_safe_execution_demo.py
├── utils/
│   ├── ingestion_loaders.py
│   ├── preprocess_and_chunk.py
│   └── retriever_strategies.py
├── common/
│   ├── Document.docx
│   ├── Document.pdf
│   └── lang_chain_domain_specific_assistant.md
├── requirement.txt
├── README.md
├── data/
│   ├── product_specs.txt
│   ├── refund_policy.txt
│   ├── shipping_info.txt
│   └── subcontinent.txt
└── db/
    ├── chroma.sqlite3
    └── fc3b1578-a351-4bce-a39a-0f08a67ef73e/

Demo Examples

qa_chain_demo.py

A short demonstration of retrieval-based QA using LangChain and a Chroma retriever.

from langchain_openai import ChatOpenAI
from langchain_classic.chains import RetrievalQA

retriever = vectordb.as_retriever(search_kwargs={"k": 4})
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)
print(qa.run("How do I issue a refund for an international order?"))

memory_multiturn_demo.py

A simple memory-enabled conversation example that keeps recent chat history for better multi-turn behavior.

from langchain.memory import ConversationBufferMemory
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    memory=memory,
)

Getting Started

1. Install dependencies

python -m pip install -r requirement.txt

2. Set your OpenAI key

For PowerShell:

$env:OPENAI_API_KEY = "your-api-key"

For Bash / WSL:

export OPENAI_API_KEY="your-api-key"

3. Ingest documents and build the index

python ingest_and_build_index.py

4. Run the FastAPI app

uvicorn app:app --reload

5. Query the assistant

POST to http://127.0.0.1:8000/query with a JSON body like:

{"q": "How can I get a refund for Product X?"}

Architecture Overview

Ingestion

Collect documents from internal sources such as:

  • product guides,
  • policies,
  • support logs,
  • knowledge bases,
  • spreadsheets and databases.

The ingestion layer converts all content into text and tracks metadata such as source, date, and author.

Preprocessing & Chunking

Documents are cleaned and split into smaller, model-friendly chunks. Good chunking helps retrieval and reduces hallucinations.

Example:

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=150)
chunks = splitter.split_documents(docs)
for c in chunks:
    c.metadata.setdefault("source", "policy_manual")
    c.metadata.setdefault("version", "v1.2")

Embeddings & Vector Store

Chunks are converted into embeddings and stored in a vector database. This example uses Chroma, but you can swap in other stores like Pinecone, Redis, or Weaviate.

Example:

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

emb = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory="./db",
    embedding_function=emb,
)

existing_count = len(vectordb.get()["ids"])
if existing_count == 0:
    vectordb.add_documents(chunks)

Metadata is important for filtering and access control. Store document source, version, category, and other attributes as metadata.

Retrieval

The retriever finds the most relevant chunks for each query. Common strategies:

  • flat k-NN retrieval,
  • cross-encoder reranking,
  • hybrid semantic + keyword search.

Example:

retriever = vectordb.as_retriever(search_kwargs={"k": 4})

QA Chain

A retrieval-based QA chain combines the retrieved context with the LLM prompt.

Example:

from langchain_openai import ChatOpenAI
from langchain_classic.chains import RetrievalQA

llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)
print(qa.run("How do I issue a refund for an international order?"))

Prompt engineering best practices:

  • keep system instructions separate from user content,
  • be explicit about style and length,
  • add a "do not hallucinate" directive,
  • ask for citations when needed.

Memory & Multi-Turn Conversations

Memory enables dialogue continuity across multiple turns. LangChain supports different memory strategies:

  • ConversationBufferMemory for short-term context,
  • ConversationSummaryMemory for longer sessions,
  • custom persistent memory for cross-session state.

Example:

from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

Tools & Safe Execution

The assistant can be extended with tools for database queries, scheduling, or other actions. Use safe, read-only tools whenever possible and require confirmation for side-effect operations.

Example:

from langchain.tools import Tool
from langchain.agents import initialize_agent, AgentType


def query_orders(order_id: str):
    return f"Order {order_id} status: Delivered successfully."

order_tool = Tool(
    name="query_orders",
    func=query_orders,
    description="Get order status by ID.",
)

agent = initialize_agent(
    tools=[order_tool],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

Safety, Privacy & Compliance

Key safety practices:

  • protect API keys and restrict DB access,
  • redact or encrypt PII,
  • log query sources and generated answers,
  • set retention policies for logs and embeddings,
  • follow regulations like HIPAA or GDPR where applicable.

Evaluation & Testing

Monitor quality and reliability using:

  • accuracy on labeled test data,
  • hallucination rate,
  • latency and cost per query,
  • user satisfaction metrics.

Testing strategies:

  • unit tests for ingestion and chunking,
  • integration tests for end-to-end retrieval,
  • adversarial tests for robustness.

Deployment & Scaling

For prototypes, local Chroma and OpenAI are easy to use. For production, consider managed or scalable stores such as Pinecone, Redis, or Weaviate. Host LLM inference with managed services or self-hosted models.

Example Dockerfile:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . /app
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Example FastAPI app:

import os
from fastapi import FastAPI
from pydantic import BaseModel
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain_classic.chains import RetrievalQA

os.environ["OPENAI_API_KEY"] = "<Your-openai-api-key>"
app = FastAPI()

class Query(BaseModel):
    q: str

emb = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory="./db",
    embedding_function=emb,
)
retriever = vectordb.as_retriever(search_kwargs={"k": 4})
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)

@app.post("/query")
async def query(q: Query):
    response = qa.run(q.q)
    return {"answer": response}

Cost Optimization

To reduce costs:

  • cache embeddings and retrieval results,
  • use smaller models for routine queries,
  • reduce k and reranking frequency,
  • batch embedding requests.

Observability & Monitoring

Track:

  • query latency,
  • errors and exceptions,
  • hallucination indicators,
  • document source usage.

Sample Data

The data/ folder contains example documents such as:

  • refund_policy.txt
  • shipping_info.txt
  • product_specs.txt

Usage Summary

  1. python ingest_and_build_index.py
  2. uvicorn app:app --reload
  3. POST queries to /query

Example request:

{"q": "How can I get a refund for Product X?"}

Example answer:

{"answer": "To get a refund for Product X, contact support within 30 days of purchase. International orders may take longer to process."}

Notes

This README has been reformatted for clarity and structure. The repo is designed to help you explore domain-specific RAG systems with LangChain and a vector database.

About

"Python-based framework for domain-specific AI assistants with document ingestion, semantic search, and conversational memory."

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages