Skip to content

apoorvaeerly/insights-engine

Repository files navigation

🧠 Eerly AI Insights Engine (v3)

Universal, Multi-Agent Business Intelligence Platform

Eerly AI is a production-grade backend that transforms natural language questions into professional data analysis. Unlike simple Text-to-SQL bots, Eerly acts as a Senior Data Scientist, providing:

  • 📝 Structured Reports (Observations, Insights, Recommendations)
  • 📊 Smart Dashboarding (Auto-generates chart layouts for frontend rendering)
  • 🛡️ Self-Healing SQL (Auto-corrects invalid queries via retry loops, up to 3 attempts)
  • Low-Latency Pipeline (Schema caching, async LLM calls, HTTP keep-alive)
  • 🔍 Full Debug Logging (Structured logs at every pipeline stage with elapsed ms)
  • 🐳 Fully Containerised (Runs end-to-end via Docker Compose)

🚀 Key Capabilities

🎓 Senior Analyst Persona

Does not just return data rows — generates a comprehensive report containing Observations, Key Insights, and Strategic Recommendations.

📊 Smart Dashboarding

Automatically determines when a visualisation is needed and generates a JSON layout (Bar, Line, Pie, Scatter) for frontend rendering.

🛡️ Self-Healing Architecture

If the AI generates invalid SQL, the system catches the error, analyses the schema, and self-corrects the query (up to 3 retries). It also features a robust LLM Fallback Mechanism that automatically switches to a high-speed Groq model (Llama 3.3 70B) if the primary Azure endpoint experiences rate limits or network issues.

⚡ Latency Optimisations (v3)

  • Schema TTL Cache – DB table-info fetched once per 10 minutes, not on every query (~200–800 ms saving per request)
  • Async LLM Calls – Router, SQL Generator, and Dashboard Designer use ainvoke so FastAPI's event loop is never blocked
  • HTTP Keep-Alive – Streamlit frontend reuses a persistent connection to the backend

🔍 Structured Debug Logging (v3)

Every pipeline stage emits timestamped [DEBUG] / [INFO] / [ERROR] log lines.
Control verbosity with LOG_LEVEL=DEBUG (development) or LOG_LEVEL=INFO (production) — no code change needed.


🧱 Tech Stack

Layer Technology
Infrastructure Docker & Docker Compose
Database Microsoft SQL Server (Eastman, via Docker) · Aiven PostgreSQL (NK Proteins, ACME Corp) · Aiven MySQL (BIDCO Africa)
Backend FastAPI + Uvicorn
Frontend Streamlit
Orchestration LangGraph
LLM Integration LangChain (Azure OpenAI Primary, Groq Llama 3.3 70B Fallback)

📂 Project Structure

insights-docker/
├── docker/
│   ├── docker-compose.yml   ← Orchestrates all services
│   ├── Dockerfile           ← Shared image for backend + frontends
│   └── init_db.sh           ← DB init helper
│
├── front-end/
│   ├── app.py               ← General Streamlit frontend (port 8501)
│   ├── app_eastman.py       ← Eastman Executive Dashboard (port 8502)
│   └── utils.py             ← SSE streaming + chart rendering helpers
│
├── core/
│   ├── config.py            ← Loads .env settings
│   ├── database.py          ← DB connection + schema TTL cache
│   ├── llm.py               ← Azure OpenAI LLM instance
│   ├── logger.py            ← Centralised logging factory (v3)
│   ├── prompts.py           ← All system prompts
│   ├── state.py             ← LangGraph state schema
│   └── streaming.py         ← Token stream queue registry
│
├── agents/
│   ├── router.py            ← Intent classifier (async, v3)
│   ├── sql_generator.py     ← SQL generation (async + cached schema, v3)
│   ├── executor.py          ← SQL execution + report synthesis
│   ├── dashboard_designer.py← Dashboard JSON generation (async, v3)
│   └── chat.py              ← General chat responder
│
├── workflows/               ← LangGraph graph definition
├── database_source/         ← Data files & ingestion scripts
├── main.py                  ← FastAPI backend entry point
├── requirements.txt
└── .env                     ← Secrets & config (never commit this)

🏗️ PART 1: One-Time Setup

Prerequisites

  • Docker Desktop installed and running (enable WSL2 backend on Windows)
  • Azure OpenAI Endpoint, API Key, and a deployed model (e.g. gpt-5)
  • Git (to clone the repo)

Step 1 — Clone the Repository

git clone <your-repo-url>
cd insights-docker

Step 2 — Create the .env File

A template is provided at .env.example. Copy it and fill in your values:

cp .env.example .env

Key variables:

Variable Purpose
AZURE_OPENAI_API_KEY Primary LLM — Azure OpenAI key
AZURE_OPENAI_ENDPOINT Azure OpenAI resource URL
AZURE_OPENAI_DEPLOYMENT Model deployment name (e.g. gpt-5)
AZURE_OPENAI_VERSION API version (e.g. 2025-01-01-preview)
LITELLM_BASE_URL (Optional) LiteLLM proxy URL — use instead of direct Azure endpoint
LITELLM_API_KEY (Optional) LiteLLM proxy API key
GROQ_API_KEY Fallback LLM — auto-activated on Azure rate limits
GEMINI_API_KEY Reserved for future use
AIVEN_SERVICE_URI Aiven PostgreSQL — NK Proteins & ACME Corp
AIVEN_SERVICE_URI_MySQL Aiven MySQL — BIDCO Africa
SQLALCHEMY_URL_EASTMAN Eastman Sales DB (SQL Server connection string)

⚠️ Never commit .env to Git. It is already listed in .gitignore.


Step 3 — Build & Start All Containers

All docker compose commands must be run from inside the docker/ subfolder.

cd docker
docker compose up --build -d

This builds one shared Docker image and starts 4 services:

Container Role Port
eerly_sql_server Microsoft SQL Server 2022 1434 (host)
eerly_backend FastAPI + LangGraph 8000
eerly_frontend Streamlit (general) 8501
eerly_frontend_eastman Eastman Executive Dashboard 8502

Wait ~30 seconds, then verify all containers are running:

docker compose ps

All services should show Up.


Step 4 — Ingest Eastman Sales Data (For Port 8502)

python database_source/Eastman_Auto/ingestion_eastman_data.py

This populates Eastman_Sales_DB used by the Eastman Executive Dashboard on port 8502.


Step 5 — Seed Aiven PostgreSQL Data (NK Proteins & ACME Corp)

This script drops and recreates the nk_proteins and acme_corp schemas on your Aiven PostgreSQL instance, populated with synthetic ERP data.

cd database_source/Avien
python generate_erp_data.py nk_proteins
python generate_erp_data.py acme_corp

You can verify the connection later using python test_aiven_connection.py.


Step 6 — Seed Aiven MySQL Data (BIDCO Africa)

This script creates and populates the BIDCO Africa tables in your Aiven MySQL instance with synthetic sales and inventory data.

cd database_source/BIDCO_Africa
python update_avien_mysql.py

You can verify the connection and data relationships using python test_avien-mysql-bda-connection.py.


🔄 PART 2: Daily Usage

Starting the App

cd docker
docker compose up -d

Only use --build again if you change requirements.txt or Dockerfile.

Accessing the App

Frontend URL Description
General Dashboard http://localhost:8501 Conversational UI — NK Proteins, ACME Corp, Eastman, BIDCO Africa
Eastman Dashboard http://localhost:8502 Eastman Executive Dashboard (dedicated view)
Backend API Docs http://localhost:8000/docs FastAPI Swagger UI

Stopping the App

docker compose down

Database data persists in a Docker volume. You do not need to restore again unless you run docker compose down -v (which wipes volumes).


🔍 PART 3: Viewing Logs (v3 Feature)

The v3 pipeline emits structured logs at every stage. Open two terminal windows, both pointing to the docker/ folder:

Terminal A — Backend (main pipeline):

cd docker
docker compose logs -f backend

Terminal B — Frontend Eastman:

cd docker
docker compose logs -f frontend_eastman

What to Look For

A successful SQL query produces this sequence in backend logs:

[API] POST /stream – db='Eastman_Sales_DB' | query='Show top 5 states...'
[STREAM][START] stream_id=abc-123 ...
[Router] Decision='DATA_QUERY' | LLM latency=312 ms
[SQLGen] Schema cache HIT for 'Eastman_Sales_DB' (expires in 580s)
[SQLGen] SQL generated in 1843 ms
[Executor] DB query completed in 87 ms
[Executor] Synthesis done – 4231 ms, 48 chunks
[Dashboard] Dashboard spec created – 1 widget(s)
[STREAM][DONE] elapsed=7312 ms | tokens=48 | streamed=True

Schema cache in action:

  • 1st query: Schema cache MISS – fetching from DB … (Xms) ← DB round-trip
  • 2nd query: Schema cache HIT (expires in Xs) ← instant, no DB call ✅

Switching Log Verbosity

Edit docker/docker-compose.yml and change LOG_LEVEL:

- LOG_LEVEL=INFO    # production: only warnings/errors
- LOG_LEVEL=DEBUG   # development: full pipeline trace (default)

Then restart: docker compose up -d --no-deps backend


🧪 Example Queries

The full executive demo script is in c_suite_queries.md.

NK Proteins / ACME Corporation

  1. "What are our top 3 most profitable products based on standard gross margin?"
  2. "Predict our total expected cash inflow over the next 30 days based on open invoices."
  3. "Identify our dead stock: which materials haven't been sold in over 90 days, and what is the total working capital blocked?"

Eastman Sales DB — Query Tiers

Micro (instant fact retrieval):

  • "Total Sales" · "Sales Trends" · "Top Products" · "Worst Performing Regions"

Operational:

  • "Show me sales for Fiscal Year 2024."
  • "Which region has the highest returns?"
  • "Compare sales between ENEPL and EAPL subsidiaries."

Strategic (tests reasoning + synthesis):

  • "Analyze the sales performance of the North region vs the South for the last 6 months and tell me which product category is driving the difference."
  • "Identify the bottom 3 performing states in terms of sales growth and list the top-selling products in those states to see if there is a mismatch."

Multi-turn conversational flow (tests context retention):

Turn Query
1 "Show me the monthly sales trend for the current fiscal year."
2 "Okay, now filter this for just the North region."
3 "Why is there a dip in November?"
4 "List the top customers who bought during that dip."

Executive Summary (triggers 3-section Observations → Insights → Recommendations format):

  • "Give me an executive summary of our sales performance in Karnataka."
  • "Summarize the performance of our Solar Solutions business segment."

BIDCO Africa DB

  1. "What are the top-selling product categories by revenue this quarter?"
  2. "Which regions have the highest inventory turnover rate?"
  3. "Show me a breakdown of sales performance by country."

🔧 Troubleshooting

"no configuration file provided: not found"

You are running docker compose from the wrong directory.
Fix: Always cd into the docker/ subfolder first:

cd insights-docker/docker
docker compose <command>

Database Connection Errors

pyodbc.OperationalError: Login failed for user 'sa'
Check that DB_PASSWORD in docker-compose.yml matches MSSQL_SA_PASSWORD in the same file. Special characters like @ must be URL-encoded as %40 in all SQLALCHEMY_URL_* connection strings.

Container Starts But Backend Crashes

Run docker compose logs backend to see the full error.
Common cause: missing or incorrect .env values (esp. AZURE_OPENAI_API_KEY).

Schema Not Updating After DB Changes

The schema is cached for 10 minutes. To force a refresh, restart the backend:

docker compose restart backend

📋 Changelog

Version Changes
v3 (Mar 2026) Schema TTL caching, async LLM agents (router, sql_generator, dashboard_designer), HTTP keep-alive, full structured debug logging across all pipeline stages, LOG_LEVEL env var
v2 Streaming SSE responses, Docker multi-service setup, self-healing SQL retries, executive summary prompt
v1 Initial LangGraph pipeline, single database support

About

repository to host versions of insights-engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors