CorpRisk-AI: Multi-Agent Corporate Due Diligence System 🏦🤖

📌 Project Overview

CorpRisk-AI is a cloud-native, multi-agent AI system designed to automate corporate banking due diligence and risk assessment. Built using LangGraph and LangChain, the system orchestrates multiple AI agents to retrieve company financial records, analyze compliance against Anti-Money Laundering (AML) policies, and synthesize structured risk reports.

This project was developed to demonstrate enterprise-grade AI engineering, specifically focusing on agentic workflows, RAG (Retrieval-Augmented Generation) architectures, and LLM observability for financial services.

🏗️ Architecture & Agentic Workflow

The application uses a stateful graph architecture (LangGraph) containing three distinct reasoning agents, backed by a PostgreSQL data quality layer:

Document Retriever Agent (RAG): Connects to a Vector Database (ChromaDB/FAISS) to fetch semantically relevant financial documents, unstructured data, and mock AML policy PDFs based on the target company.
Compliance & Risk Agent: Evaluates the retrieved context against established banking guardrails. Identifies high-risk indicators, compliance breaches, or missing financial data.
Report Synthesizer Agent: Aggregates findings from the previous agents into a structured JSON payload, providing a final decision (e.g., APPROVED, MANUAL_REVIEW, REJECTED).
Data Quality Layer (PostgreSQL): Every assessment is logged to a relational database with automated validation checks — completeness, consistency, and timeliness monitoring.

graph TD;
    A[Client Request: Company Name] --> B[FastAPI Backend]
    B --> C{LangGraph Orchestrator}
    C -->|State: Query| D[Retriever Agent + Vector DB]
    D -->|State: Context| E[Compliance Agent]
    E -->|State: Risk Flags| F[Synthesizer Agent]
    F -->|State: JSON Report| B
    C -.->|Telemetry| G[(LangSmith Observability)]

(Note: GitHub natively renders Mermaid diagrams. The above will display as a beautiful flowchart in your repo!)

🚀 Key Features

Multi-Agent Orchestration: Utilizes LangGraph for stateful, cyclic, and conditional agent execution.
Enterprise RAG Pipeline: Employs optimized document chunking and vector embeddings for precise context retrieval.
LLM Observability & Evaluation: Integrated deeply with LangSmith to monitor token usage, track execution latency, and evaluate prompt performance and AI safety guardrails.
Cloud-Native & Scalable: Wrapped in a FastAPI application and containerized via Docker, making it ready for deployment on Azure Kubernetes Service (AKS) or Azure Container Apps.
Azure Ecosystem Ready: Designed to seamlessly swap standard OpenAI endpoints with secure Azure OpenAI enterprise endpoints.

🛠️ Tech Stack

Core AI: Python, LangChain, LangGraph, OpenAI / Azure OpenAI
Data & RAG: ChromaDB / FAISS (Vector Store), PyPDFLoader
Data Quality & Storage: PostgreSQL, SQLAlchemy
Backend: FastAPI, Uvicorn, Pydantic
DevOps & Observability: Docker, LangSmith
Domain: FinTech, Banking Compliance, Risk Analysis

⚙️ Local Installation & Setup

Prerequisites

Python 3.10+
PostgreSQL 17+ (local install via brew install postgresql@17)
Docker (Optional, for containerized run)
OpenAI API Key (or Azure OpenAI credentials)
LangSmith API Key (for observability)

1. Clone the repository

git clone https://github.com/NeelM47/CorpRisk-AI.git
cd CorpRisk-AI

2. Environment Variables

Create a .env file in the root directory:

OPENAI_API_KEY=your_openai_key
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_API_KEY=your_langsmith_key
LANGCHAIN_PROJECT="CorpRisk-DueDiligence"

3. Setup Virtual Environment & Install Dependencies

uv venv
source .venv/bin/activate
uv sync

4. Setup PostgreSQL Database

createdb corp_risk_db
# Tables are created automatically on first startup

5. Ingest Mock Data (RAG Setup)

Run the ingestion script to populate the Vector DB with mock corporate financial reports and AML policies.

python src/ingest_data.py

6. Run the Application

Via Python:

uv run uvicorn src.main:app --reload --host 0.0.0.0 --port 8000

Via Docker:

docker build -t corprisk-ai .
docker run -p 8000:8000 --env-file .env corprisk-ai

📡 API Usage

Once the server is running, access the automatic Swagger documentation at http://localhost:8000/docs.

Endpoint: POST /api/v1/assess-company

Request Payload:

{
  "company_name": "TechCorp Innovations Ltd",
  "assessment_type": "standard_due_diligence"
}

Response Payload:

{
  "company_name": "TechCorp Innovations Ltd",
  "status": "MANUAL_REVIEW",
  "retrieved_documents": 4,
  "risk_flags":[
    "Incomplete UBO (Ultimate Beneficial Owner) documentation for Q3.",
    "Unusual cross-border transaction volume detected in mock financial summary."
  ],
  "summary": "TechCorp Innovations Ltd shows healthy revenue streams, but flagged transactions require secondary compliance review according to AML Policy section 3.2."
}

Data Quality Endpoints

Endpoint	Method	Description
`/api/v1/quality/metrics`	GET	Returns latest 50 quality check results
`/api/v1/quality/run`	POST	Triggers data quality validation checks manually

Quality checks run automatically on each assessment and track:

Completeness: null/missing field rates per table
Consistency: duplicate records and orphaned references
Timeliness: records not assessed within 90 days

📊 LLM Observability & Evaluation

To ensure AI safety and reliable performance, this system relies on LangSmith. Every API call generates a trace that tracks:

Agent Trajectories: Step-by-step reasoning logs of the Compliance Agent.
Retrieval Effectiveness: What exact chunks were pulled from the Vector DB.
Cost & Latency: Token usage metrics for cost optimization.

(Feel free to check the assets/ folder for screenshots of the LangSmith dashboard tracking this application).

🤝 Next Steps & Future Work

Migrate Vector Store from local ChromaDB to Azure AI Search.
Implement robust CI/CD pipelines via Azure DevOps.
Add Human-in-the-Loop (HITL) approval nodes in the LangGraph workflow.

Designed and developed by Neel More as a demonstration of scalable AI engineering in the financial sector.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
db		db
post		post
quality		quality
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
ingest.py		ingest.py
pyproject.toml		pyproject.toml
requirements_docker.txt		requirements_docker.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CorpRisk-AI: Multi-Agent Corporate Due Diligence System 🏦🤖

📌 Project Overview

🏗️ Architecture & Agentic Workflow

🚀 Key Features

🛠️ Tech Stack

⚙️ Local Installation & Setup

Prerequisites

1. Clone the repository

2. Environment Variables

3. Setup Virtual Environment & Install Dependencies

4. Setup PostgreSQL Database

5. Ingest Mock Data (RAG Setup)

6. Run the Application

📡 API Usage

Data Quality Endpoints

📊 LLM Observability & Evaluation

🤝 Next Steps & Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CorpRisk-AI: Multi-Agent Corporate Due Diligence System 🏦🤖

📌 Project Overview

🏗️ Architecture & Agentic Workflow

🚀 Key Features

🛠️ Tech Stack

⚙️ Local Installation & Setup

Prerequisites

1. Clone the repository

2. Environment Variables

3. Setup Virtual Environment & Install Dependencies

4. Setup PostgreSQL Database

5. Ingest Mock Data (RAG Setup)

6. Run the Application

📡 API Usage

Data Quality Endpoints

📊 LLM Observability & Evaluation

🤝 Next Steps & Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages