Domain-Specific AI Assistant

Overview

This repository demonstrates a domain-specific AI assistant built with LangChain, vector search, and a retrieval-augmented generation (RAG) pipeline.

The goal is to create an AI system that:

understands industry-specific documents,
retrieves the right knowledge from a private data store,
generates grounded, accurate answers,
and supports safe, auditable workflows.

Why Domain-Specific AI?

General-purpose AI often falls short for specialized use cases because it lacks the insider context and rules of a specific domain.

Domain-specific assistants are built from real business data and designed around the language, workflows, and compliance requirements of a target industry.

Key benefits:

better accuracy and relevance,
lower hallucination risk,
improved trust and auditability,
stronger support for regulated environments.

Tech Stack

Component	Technology	Purpose
Language	Python 3.11+	Core development language
LLM Framework	LangChain	Orchestration and RAG pipeline
Language Model	OpenAI GPT	Generative responses and embeddings
Vector Database	Chroma	Semantic search and embeddings storage
Embeddings	OpenAI Embeddings	Document and query vectorization
Text Processing	LangChain Text Splitters	Document chunking and preprocessing
Web Framework	FastAPI	REST API deployment
Server	Uvicorn	ASGI server for FastAPI
Memory Management	LangChain Memory	Multi-turn conversation context
Document Loading	LangChain Loaders	PDF, DOCX, and text ingestion
Tool Integration	LangChain Tools & Agents	Safe function execution

Key Libraries

langchain - LLM orchestration and RAG patterns
langchain-openai - OpenAI integration
langchain-chroma - Chroma vector store integration
fastapi - REST API framework
uvicorn - ASGI web server
openai - OpenAI API client
chromadb - Vector database

Core Pipeline

The main pattern in this repository is:

ingest → chunk → embed → index → retrieve → generate → act

This pipeline ensures the assistant is grounded in real data and able to answer domain-specific questions reliably.

What This Project Shows

This repo includes examples for:

document ingestion and preprocessing,
text chunking and metadata enrichment,
building embeddings and a Chroma vector store,
retrieval-based QA with LangChain,
memory-enabled conversational behavior,
safe tool integration,
basic FastAPI deployment.

Repository Files

scripts/ — executable workflows and demos.
utils/ — helper modules for ingestion, chunking, and retrieval.
common/ — reference documents and shared notes.
requirement.txt — Python dependencies.
data/ — sample domain documents.
db/ — persisted Chroma database storage.

Use scripts/ for runnable workflows and demos, and utils/ for reusable ingestion, chunking, and retrieval helpers.

Project Structure

Domain-Specific-AI-Assistant/
├── scripts/
│   ├── app.py
│   ├── generate_embeddings.py
│   ├── ingest_and_build_index.py
│   ├── qa_chain_demo.py
│   ├── memory_multiturn_demo.py
│   └── tools_safe_execution_demo.py
├── utils/
│   ├── ingestion_loaders.py
│   ├── preprocess_and_chunk.py
│   └── retriever_strategies.py
├── common/
│   ├── Document.docx
│   ├── Document.pdf
│   └── lang_chain_domain_specific_assistant.md
├── requirement.txt
├── README.md
├── data/
│   ├── product_specs.txt
│   ├── refund_policy.txt
│   ├── shipping_info.txt
│   └── subcontinent.txt
└── db/
    ├── chroma.sqlite3
    └── fc3b1578-a351-4bce-a39a-0f08a67ef73e/

Demo Examples

`qa_chain_demo.py`

A short demonstration of retrieval-based QA using LangChain and a Chroma retriever.

from langchain_openai import ChatOpenAI
from langchain_classic.chains import RetrievalQA

retriever = vectordb.as_retriever(search_kwargs={"k": 4})
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)
print(qa.run("How do I issue a refund for an international order?"))

`memory_multiturn_demo.py`

A simple memory-enabled conversation example that keeps recent chat history for better multi-turn behavior.

from langchain.memory import ConversationBufferMemory
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    memory=memory,
)

Getting Started

1. Install dependencies

python -m pip install -r requirement.txt

2. Set your OpenAI key

For PowerShell:

$env:OPENAI_API_KEY = "your-api-key"

For Bash / WSL:

export OPENAI_API_KEY="your-api-key"

3. Ingest documents and build the index

python ingest_and_build_index.py

4. Run the FastAPI app

uvicorn app:app --reload

5. Query the assistant

POST to http://127.0.0.1:8000/query with a JSON body like:

{"q": "How can I get a refund for Product X?"}

Architecture Overview

Ingestion

Collect documents from internal sources such as:

product guides,
policies,
support logs,
knowledge bases,
spreadsheets and databases.

The ingestion layer converts all content into text and tracks metadata such as source, date, and author.

Preprocessing & Chunking

Documents are cleaned and split into smaller, model-friendly chunks. Good chunking helps retrieval and reduces hallucinations.

Example:

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=150)
chunks = splitter.split_documents(docs)
for c in chunks:
    c.metadata.setdefault("source", "policy_manual")
    c.metadata.setdefault("version", "v1.2")

Embeddings & Vector Store

Chunks are converted into embeddings and stored in a vector database. This example uses Chroma, but you can swap in other stores like Pinecone, Redis, or Weaviate.

Example:

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

emb = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory="./db",
    embedding_function=emb,
)

existing_count = len(vectordb.get()["ids"])
if existing_count == 0:
    vectordb.add_documents(chunks)

Metadata is important for filtering and access control. Store document source, version, category, and other attributes as metadata.

Retrieval

The retriever finds the most relevant chunks for each query. Common strategies:

flat k-NN retrieval,
cross-encoder reranking,
hybrid semantic + keyword search.

Example:

retriever = vectordb.as_retriever(search_kwargs={"k": 4})

QA Chain

A retrieval-based QA chain combines the retrieved context with the LLM prompt.

Example:

from langchain_openai import ChatOpenAI
from langchain_classic.chains import RetrievalQA

llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)
print(qa.run("How do I issue a refund for an international order?"))

Prompt engineering best practices:

keep system instructions separate from user content,
be explicit about style and length,
add a "do not hallucinate" directive,
ask for citations when needed.

Memory & Multi-Turn Conversations

Memory enables dialogue continuity across multiple turns. LangChain supports different memory strategies:

ConversationBufferMemory for short-term context,
ConversationSummaryMemory for longer sessions,
custom persistent memory for cross-session state.

Example:

from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

Tools & Safe Execution

The assistant can be extended with tools for database queries, scheduling, or other actions. Use safe, read-only tools whenever possible and require confirmation for side-effect operations.

Example:

from langchain.tools import Tool
from langchain.agents import initialize_agent, AgentType


def query_orders(order_id: str):
    return f"Order {order_id} status: Delivered successfully."

order_tool = Tool(
    name="query_orders",
    func=query_orders,
    description="Get order status by ID.",
)

agent = initialize_agent(
    tools=[order_tool],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

Safety, Privacy & Compliance

Key safety practices:

protect API keys and restrict DB access,
redact or encrypt PII,
log query sources and generated answers,
set retention policies for logs and embeddings,
follow regulations like HIPAA or GDPR where applicable.

Evaluation & Testing

Monitor quality and reliability using:

accuracy on labeled test data,
hallucination rate,
latency and cost per query,
user satisfaction metrics.

Testing strategies:

unit tests for ingestion and chunking,
integration tests for end-to-end retrieval,
adversarial tests for robustness.

Deployment & Scaling

For prototypes, local Chroma and OpenAI are easy to use. For production, consider managed or scalable stores such as Pinecone, Redis, or Weaviate. Host LLM inference with managed services or self-hosted models.

Example Dockerfile:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . /app
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Example FastAPI app:

import os
from fastapi import FastAPI
from pydantic import BaseModel
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain_classic.chains import RetrievalQA

os.environ["OPENAI_API_KEY"] = "<Your-openai-api-key>"
app = FastAPI()

class Query(BaseModel):
    q: str

emb = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory="./db",
    embedding_function=emb,
)
retriever = vectordb.as_retriever(search_kwargs={"k": 4})
llm = ChatOpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)

@app.post("/query")
async def query(q: Query):
    response = qa.run(q.q)
    return {"answer": response}

Cost Optimization

To reduce costs:

cache embeddings and retrieval results,
use smaller models for routine queries,
reduce k and reranking frequency,
batch embedding requests.

Observability & Monitoring

Track:

query latency,
errors and exceptions,
hallucination indicators,
document source usage.

Sample Data

The data/ folder contains example documents such as:

refund_policy.txt
shipping_info.txt
product_specs.txt

Usage Summary

python ingest_and_build_index.py
uvicorn app:app --reload
POST queries to /query

Example request:

{"q": "How can I get a refund for Product X?"}

Example answer:

{"answer": "To get a refund for Product X, contact support within 30 days of purchase. International orders may take longer to process."}

Notes

This README has been reformatted for clarity and structure. The repo is designed to help you explore domain-specific RAG systems with LangChain and a vector database.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
common		common
data		data
db		db
scripts		scripts
utils		utils
README.md		README.md
requirement.txt		requirement.txt

Folders and files

Latest commit

History

Repository files navigation

Domain-Specific AI Assistant

Overview

Why Domain-Specific AI?

Tech Stack

Key Libraries

Core Pipeline

What This Project Shows

Repository Files

Project Structure

Demo Examples

qa_chain_demo.py

memory_multiturn_demo.py

Getting Started

1. Install dependencies

2. Set your OpenAI key

3. Ingest documents and build the index

4. Run the FastAPI app

5. Query the assistant

Architecture Overview

Ingestion

Preprocessing & Chunking

Embeddings & Vector Store

Retrieval

QA Chain

Memory & Multi-Turn Conversations

Tools & Safe Execution

Safety, Privacy & Compliance

Evaluation & Testing

Deployment & Scaling

Cost Optimization

Observability & Monitoring

Sample Data

Usage Summary

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`qa_chain_demo.py`

`memory_multiturn_demo.py`

Packages