This repository demonstrates a domain-specific AI assistant built with LangChain, vector search, and a retrieval-augmented generation (RAG) pipeline.
The goal is to create an AI system that:
- understands industry-specific documents,
- retrieves the right knowledge from a private data store,
- generates grounded, accurate answers,
- and supports safe, auditable workflows.
General-purpose AI often falls short for specialized use cases because it lacks the insider context and rules of a specific domain.
Domain-specific assistants are built from real business data and designed around the language, workflows, and compliance requirements of a target industry.
Key benefits:
- better accuracy and relevance,
- lower hallucination risk,
- improved trust and auditability,
- stronger support for regulated environments.
| Component | Technology | Purpose |
|---|---|---|
| Language | Python 3.11+ | Core development language |
| LLM Framework | LangChain | Orchestration and RAG pipeline |
| Language Model | OpenAI GPT | Generative responses and embeddings |
| Vector Database | Chroma | Semantic search and embeddings storage |
| Embeddings | OpenAI Embeddings | Document and query vectorization |
| Text Processing | LangChain Text Splitters | Document chunking and preprocessing |
| Web Framework | FastAPI | REST API deployment |
| Server | Uvicorn | ASGI server for FastAPI |
| Memory Management | LangChain Memory | Multi-turn conversation context |
| Document Loading | LangChain Loaders | PDF, DOCX, and text ingestion |
| Tool Integration | LangChain Tools & Agents | Safe function execution |
langchain- LLM orchestration and RAG patternslangchain-openai- OpenAI integrationlangchain-chroma- Chroma vector store integrationfastapi- REST API frameworkuvicorn- ASGI web serveropenai- OpenAI API clientchromadb- Vector database
The main pattern in this repository is:
ingest → chunk → embed → index → retrieve → generate → act
This pipeline ensures the assistant is grounded in real data and able to answer domain-specific questions reliably.
This repo includes examples for:
- document ingestion and preprocessing,
- text chunking and metadata enrichment,
- building embeddings and a Chroma vector store,
- retrieval-based QA with LangChain,
- memory-enabled conversational behavior,
- safe tool integration,
- basic FastAPI deployment.
scripts/— executable workflows and demos.utils/— helper modules for ingestion, chunking, and retrieval.common/— reference documents and shared notes.requirement.txt— Python dependencies.data/— sample domain documents.db/— persisted Chroma database storage.
Use
scripts/for runnable workflows and demos, andutils/for reusable ingestion, chunking, and retrieval helpers.
Domain-Specific-AI-Assistant/
├── scripts/
│ ├── app.py
│ ├── generate_embeddings.py
│ ├── ingest_and_build_index.py
│ ├── qa_chain_demo.py
│ ├── memory_multiturn_demo.py
│ └── tools_safe_execution_demo.py
├── utils/
│ ├── ingestion_loaders.py
│ ├── preprocess_and_chunk.py
│ └── retriever_strategies.py
├── common/
│ ├── Document.docx
│ ├── Document.pdf
│ └── lang_chain_domain_specific_assistant.md
├── requirement.txt
├── README.md
├── data/
│ ├── product_specs.txt
│ ├── refund_policy.txt
│ ├── shipping_info.txt
│ └── subcontinent.txt
└── db/
├── chroma.sqlite3
└── fc3b1578-a351-4bce-a39a-0f08a67ef73e/
A short demonstration of retrieval-based QA using LangChain and a Chroma retriever.
from langchain_openai import ChatOpenAI
from langchain_classic.chains import RetrievalQA
retriever = vectordb.as_retriever(search_kwargs={"k": 4})
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
)
print(qa.run("How do I issue a refund for an international order?"))A simple memory-enabled conversation example that keeps recent chat history for better multi-turn behavior.
from langchain.memory import ConversationBufferMemory
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
memory=memory,
)python -m pip install -r requirement.txtFor PowerShell:
$env:OPENAI_API_KEY = "your-api-key"For Bash / WSL:
export OPENAI_API_KEY="your-api-key"python ingest_and_build_index.pyuvicorn app:app --reloadPOST to http://127.0.0.1:8000/query with a JSON body like:
{"q": "How can I get a refund for Product X?"}Collect documents from internal sources such as:
- product guides,
- policies,
- support logs,
- knowledge bases,
- spreadsheets and databases.
The ingestion layer converts all content into text and tracks metadata such as source, date, and author.
Documents are cleaned and split into smaller, model-friendly chunks. Good chunking helps retrieval and reduces hallucinations.
Example:
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=150)
chunks = splitter.split_documents(docs)
for c in chunks:
c.metadata.setdefault("source", "policy_manual")
c.metadata.setdefault("version", "v1.2")Chunks are converted into embeddings and stored in a vector database. This example uses Chroma, but you can swap in other stores like Pinecone, Redis, or Weaviate.
Example:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
emb = OpenAIEmbeddings()
vectordb = Chroma(
persist_directory="./db",
embedding_function=emb,
)
existing_count = len(vectordb.get()["ids"])
if existing_count == 0:
vectordb.add_documents(chunks)Metadata is important for filtering and access control. Store document source, version, category, and other attributes as metadata.
The retriever finds the most relevant chunks for each query. Common strategies:
- flat k-NN retrieval,
- cross-encoder reranking,
- hybrid semantic + keyword search.
Example:
retriever = vectordb.as_retriever(search_kwargs={"k": 4})A retrieval-based QA chain combines the retrieved context with the LLM prompt.
Example:
from langchain_openai import ChatOpenAI
from langchain_classic.chains import RetrievalQA
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
)
print(qa.run("How do I issue a refund for an international order?"))Prompt engineering best practices:
- keep system instructions separate from user content,
- be explicit about style and length,
- add a "do not hallucinate" directive,
- ask for citations when needed.
Memory enables dialogue continuity across multiple turns. LangChain supports different memory strategies:
ConversationBufferMemoryfor short-term context,ConversationSummaryMemoryfor longer sessions,- custom persistent memory for cross-session state.
Example:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)The assistant can be extended with tools for database queries, scheduling, or other actions. Use safe, read-only tools whenever possible and require confirmation for side-effect operations.
Example:
from langchain.tools import Tool
from langchain.agents import initialize_agent, AgentType
def query_orders(order_id: str):
return f"Order {order_id} status: Delivered successfully."
order_tool = Tool(
name="query_orders",
func=query_orders,
description="Get order status by ID.",
)
agent = initialize_agent(
tools=[order_tool],
llm=llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
)Key safety practices:
- protect API keys and restrict DB access,
- redact or encrypt PII,
- log query sources and generated answers,
- set retention policies for logs and embeddings,
- follow regulations like HIPAA or GDPR where applicable.
Monitor quality and reliability using:
- accuracy on labeled test data,
- hallucination rate,
- latency and cost per query,
- user satisfaction metrics.
Testing strategies:
- unit tests for ingestion and chunking,
- integration tests for end-to-end retrieval,
- adversarial tests for robustness.
For prototypes, local Chroma and OpenAI are easy to use. For production, consider managed or scalable stores such as Pinecone, Redis, or Weaviate. Host LLM inference with managed services or self-hosted models.
Example Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . /app
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]Example FastAPI app:
import os
from fastapi import FastAPI
from pydantic import BaseModel
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain_classic.chains import RetrievalQA
os.environ["OPENAI_API_KEY"] = "<Your-openai-api-key>"
app = FastAPI()
class Query(BaseModel):
q: str
emb = OpenAIEmbeddings()
vectordb = Chroma(
persist_directory="./db",
embedding_function=emb,
)
retriever = vectordb.as_retriever(search_kwargs={"k": 4})
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
)
@app.post("/query")
async def query(q: Query):
response = qa.run(q.q)
return {"answer": response}To reduce costs:
- cache embeddings and retrieval results,
- use smaller models for routine queries,
- reduce
kand reranking frequency, - batch embedding requests.
Track:
- query latency,
- errors and exceptions,
- hallucination indicators,
- document source usage.
The data/ folder contains example documents such as:
refund_policy.txtshipping_info.txtproduct_specs.txt
python ingest_and_build_index.pyuvicorn app:app --reload- POST queries to
/query
Example request:
{"q": "How can I get a refund for Product X?"}Example answer:
{"answer": "To get a refund for Product X, contact support within 30 days of purchase. International orders may take longer to process."}This README has been reformatted for clarity and structure. The repo is designed to help you explore domain-specific RAG systems with LangChain and a vector database.