███████╗██╗███╗ ██╗ ██████╗ ██████╗ █████╗ ██╔════╝██║████╗ ██║██╔═══██╗██╔══██╗██╔══██╗ █████╗ ██║██╔██╗ ██║██║ ██║██████╔╝███████║ ██╔══╝ ██║██║╚██╗██║██║ ██║██╔══██╗██╔══██║ ██║ ██║██║ ╚████║╚██████╔╝██║ ██║██║ ██║ ╚═╝ ╚═╝╚═╝ ╚═══╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝
Production-grade AI equities intelligence for NRI & global investors.
RAG · LangGraph · LangSmith · Groq · Recharts · MCP · Real-time · RAGAS
Finora is a production-grade AI equities intelligence platform built for NRI and global investors. Not a demo. Not a notebook. A real system built to production RAG standards — typed, traced, tested, and deployed.
Two products in one:
- Dashboard — Premium dark-mode stock intelligence UI covering 555+ stocks (S&P 500 + NIFTY 50). Real-time price, fundamentals, analyst consensus, 20-year historical patterns, live news RAG, and candlestick charts with 9 timeframes.
- Finora AI — Bottom-right floating chatbot with two user modes. Routes every query through a five-layer retrieval system: live market data, 20yr OHLCV patterns, news RAG, SEC filings, and structured fundamentals — fused, reranked, then sent to an intent-routed response engine. Every run traced in LangSmith.
The chatbot exposes two distinct behavioral modes, selectable in the chat panel:
For everyday investors. Provides analysis, context, and market intelligence. Buy/sell queries are redirected with a clear guardrail — the system surfaces what analysts say and key metrics instead of giving personal financial advice. Every guardrail response ends with the exact phrase: "Consider consulting a financial advisor before making investment decisions."
For active traders. Provides explicit directional signals — bullish / bearish / neutral — grounded in volume, momentum, and technical context. Buy/sell queries receive signal-grounded analysis with risk context. No absolute directives ("you should buy") are ever given, but the system answers: "momentum is bullish, volume confirms, key risk is..."
Mode is sent with every request as user_mode: "insight" | "trader" and drives both the system prompt behavior and the guardrail routing — different blocked intent sets per mode.
Every query → llama-3.1-8b-instant (~200ms) → one or more intents → parallel LangGraph branches:
REAL_TIME | NEWS | HISTORICAL | FUNDAMENTAL
Summary bypass: Queries matching "summarize", "overview", "what's happening", "explain this stock", etc. skip the LLM classifier entirely and return all 4 intents deterministically — no wasted LLM call, no mis-classification.
Multi-intent: "Compare AAPL and MSFT earnings history" → [HISTORICAL, FUNDAMENTAL] → 2 parallel branches run concurrently.
Before sending anything to the LLM, the fusion node deterministically computes:
narrative_hint— dominant signal label:sharp_downward_move | sharp_upward_move | high_volume_move | near_52w_high | near_52w_low | analyst_strongly_bullish | analyst_bearish | mild_upward_move | mild_downward_move | consolidating— priority-ranked, not guessedconfidence_level—high(3+ strong signals, 3+ data sources) /medium/lowconflict— detects 4 conflict patterns: price down + analysts bullish, price up + analysts cautious, near 52W low + analysts bullish, near 52W high + target below priceuncertainty_flag— true when confidence is low or unresolved conflict exists
These values drive the LLM prompt behavior — the model is told what signals dominate, not asked to figure it out. This eliminates hedging on high-confidence data and prevents fabrication on low-signal days.
Dense vectors (MiniLM-L6-v2) catch semantic similarity. BM25 catches exact ticker/date matches. Reciprocal Rank Fusion merges both. BM25 catches "AAPL on 2022-01-14"; dense catches "when did Apple last see a similar drawdown."
Over-retrieve 2× TOP_K → rerank with Cohere rerank-english-v3.0 (primary) → falls back to BAAI/bge-reranker-base locally on rate-limit or failure. Cross-encoders see the full query-passage pair simultaneously — largest single quality improvement in any RAG system.
Maximal Marginal Relevance (λ=0.6) ensures final TOP_K chunks are both relevant and diverse. Prevents the LLM from receiving 8 near-identical articles about the same catalyst.
Every query is not only routed through RAG layers — it is also classified into a response intent that selects the system prompt, token budget, and post-processing pipeline. This is independent of the RAG intent classification.
| Intent | Triggers | Token Budget | Behavior |
|---|---|---|---|
metric |
Single financial fact — PE, EPS, price, yield | 400 | Returns the value with units. No narrative filler. Falls back gracefully when data is missing. |
explain |
"What is...", "How does...work", "Define..." | 600 | Clean educational explanation. Bypasses directional mode reasoning — concept definitions don't need a bullish/bearish stance. |
trade |
"Should I buy/sell", "entry point", "short this" | 800 | Mode-aware directional signal with risk context. Insight → redirects. Trader → explicit signal framing. |
summarize |
Everything else — analysis, overview, section queries | 1200–2500 | Full structured narrative with relevant sections. Section-focused queries trigger sub-intent narrowing (see below). |
Within summarize, Finora narrows the response to exactly the sections the user asked about:
"Apple risks" → ## Key Risks only
"Apple valuation" → ## Valuation only
"Apple valuation and risks"→ ## Valuation + ## Key Risks
"Apple cash flow" → ## Cash Flow Generation only
"Apple latest news" → ## Recent Drivers & News only
"Apple historical pattern" → ## Historical Parallels only
Section keywords are matched across 9 topic areas: risks · business · cashflow · balance sheet · valuation · analyst · catalysts · historical · news
Semantic expansion handles paraphrased intent without extra LLM calls:
- "downside", "go wrong", "threat" →
risks - "cheap", "expensive", "priced" →
valuation
Full-analysis override: When the query contains "analyze", "analysis", "deep dive", or "full analysis", section narrowing is suppressed even if section keywords appear — "Apple looks expensive but growing fast — analyze" produces a full multi-section summary, not just the valuation section.
The LLM is instructed to generate only the requested sections via a CRITICAL FOCUS block injected before the output template. A post-processing filter (_enforce_sections) then strips any ## headers that fall outside the requested set — LLM compliance is not trusted for this constraint.
Both mode system prompts include explicit DATA LAYERS routing tables that tell the LLM which RAG layer maps to which output section:
[REAL-TIME] → Current Price & Movement, Today's Story
[NEWS] → Recent Drivers & News, Catalysts
[HISTORICAL] → Historical Parallels (MUST include if present)
[FUNDAMENTALS] → Valuation, Business Model, Cash Flow, Balance Sheet
This prevents the LLM from inventing data for a section when that layer returned nothing — it knows to skip the section rather than hallucinate.
All token budgets include a 20% buffer applied at runtime: max_tokens = int(max_tokens * 1.2). This prevents mid-sentence truncation on responses that slightly exceed the base limit. Multi-section queries use the maximum token budget across all requested sections.
When the output guardrail or section enforcement reduces the response below the minimum threshold, the system extracts readable lines from fused_context and returns a partial-data answer rather than a dead-end error message.
Every response (all intents including metric) includes 3 follow-up question chips after the answer — max 8 words each, covering different angles. Suggestions are extracted from a ---SUGGESTIONS--- delimiter in the LLM output and passed as a separate SSE event to the frontend.
- INSIGHT / TRADER mode toggle — in chat panel header; drives entire response behavior
- Streaming SSE — token-by-token with intent badges shown during retrieval
- Intent-routed response engine — metric / explain / trade / summarize with per-intent prompts and token budgets
- Multi-section sub-intent detection — "Apple valuation and risks" → two sections, not one
- Full-analysis override — explicit "analyze" / "deep dive" queries always produce full summaries
- "Summarize this stock" chip — prominent primary chip; triggers full structured narrative across all 4 RAG branches
- Context-aware suggestion chips — 3 follow-up questions on every response, auto-generated, max 8 words
- Narrative-first structure — every response leads with the dominant signal, not a generic opening
- Guardrail redirects — INSIGHT mode buy/sell queries get signal context + exact disclaimer phrase, never a refusal
- Conversation memory — last 4 turns injected into fusion prompt for contextual follow-ups
- Dynamic price charts — embedded Recharts AreaChart matched to the query timeframe (1D intraday through ALL 20yr)
- Finance bar charts — Revenue vs Net Income over 5 fiscal years, shown on financial history queries
- Smart citations — news sources shown only on queries with news/movement intent
- SEBI/SEC disclaimers — auto-injected, locale-aware, only on directional language
- Early response cache — TTL-based cache bypasses the full graph on repeated queries
- LangSmith trace link — every response includes a clickable ↗ trace link in the UI
| Layer | Technology |
|---|---|
| Frontend | Next.js 14 App Router · TypeScript · Tailwind CSS · shadcn/ui |
| Charts | Recharts — candlestick OHLCV + dynamic chat area charts + finance bar charts |
| Backend | FastAPI · Python 3.11 · Pydantic v2 · Uvicorn |
| Agent Graph | LangGraph 0.2 StateGraph — parallel branches, typed state |
| RAG | LangChain v0.3 · Hybrid BM25+Dense · HyDE · Cohere rerank · BAAI fallback |
| Response Engine | Intent-routed prompts · Multi-section detection · Hard section enforcement · Token buffering |
| Observability | LangSmith — every graph run traced |
| RAG Evaluation | RAGAS 0.2.5 — faithfulness, answer relevancy, context recall, context precision, noise sensitivity |
| LLM Primary | Groq llama-3.3-70b-versatile — response generation |
| LLM Fast | Groq llama-3.1-8b-instant — intent classification, guardrails, HyDE |
| Embeddings | HuggingFace all-MiniLM-L6-v2 — local, free, no API key |
| Vector Store | Qdrant Cloud (free 1GB cluster) |
| MCP | FastMCP server — 6 tools: quote, historical RAG, news RAG, fundamentals, screener, universe |
| Real-time Data | Yahoo Finance via curl_cffi Chrome TLS impersonation — no API key needed |
| News | Google News RSS — live, locale-aware (NSE/BSE for Indian stocks) |
| Historical | 20yr weekly OHLCV via Yahoo Finance → FinancialEventChunker → Qdrant |
| Scheduling | APScheduler — news every 15min, historical daily |
| Guardrails | llama-3.1-8b input classifier + mode-aware blocked intents + hallucination check + PII scrub |
| Deploy | Vercel (frontend) · HuggingFace Spaces Docker (backend) |
finora/
├── CLAUDE.md
├── docker-compose.yml
├── finora-backend
│ ├── backend
│ │ ├── api
│ │ │ ├── middleware
│ │ │ │ ├── guardrails.py
│ │ │ │ └── rate_limit.py
│ │ │ └── routes
│ │ │ ├── chat.py ← SSE streaming, chart data, citation gating
│ │ │ ├── health.py
│ │ │ └── stocks.py
│ │ ├── data
│ │ │ ├── eval_results/
│ │ │ └── universe/stocks.json ← 555 stocks, committed
│ │ ├── finora_mcp
│ │ │ ├── server.py
│ │ │ └── tools/ ← 6 MCP tools
│ │ ├── graph
│ │ │ ├── finora_graph.py ← LangGraph StateGraph master definition
│ │ │ ├── state.py
│ │ │ └── nodes
│ │ │ ├── intent_classifier.py
│ │ │ ├── realtime_node.py
│ │ │ ├── news_rag_node.py
│ │ │ ├── historical_rag_node.py
│ │ │ ├── fundamentals_node.py
│ │ │ ├── fusion_node.py ← Pre-computed signals, conflict detection
│ │ │ ├── response_cache.py ← TTL-based early response cache
│ │ │ └── response_node.py ← Intent-routed engine, section detection
│ │ ├── guardrails
│ │ │ ├── classifier.py ← llama-3.1-8b safety classifier, mode-aware
│ │ │ ├── output_filter.py ← Hallucination check, PII scrub, disclaimers
│ │ │ └── disclaimers.py
│ │ ├── observability
│ │ │ ├── langsmith_client.py
│ │ │ └── langsmith_url.py
│ │ └── rag
│ │ ├── chunking/ ← SlidingWindow, Semantic, FinancialEvent
│ │ ├── evaluation/ ← RAGAS runner + synthetic QA generator
│ │ ├── ingestion/ ← Historical, news, filings, universe
│ │ ├── retrieval/ ← Hybrid, HyDE, reranker, MMR dedup
│ │ └── yahoo_client.py ← curl_cffi Chrome TLS impersonation
│ ├── scripts/ ← build_universe, ingest_*, eval_rag
│ ├── Dockerfile
│ └── requirements.txt
├── finora-frontend
│ ├── app
│ │ ├── dashboard/[ticker]/page.tsx
│ │ └── eval/page.tsx ← RAGAS results UI
│ ├── components
│ │ ├── chat
│ │ │ ├── ChatWidget.tsx ← FAB + slide-up panel + mode toggle
│ │ │ ├── ChatMessage.tsx ← Markdown · AreaChart · BarChart · citations
│ │ │ ├── ChatInput.tsx
│ │ │ └── SuggestionChips.tsx ← Dynamic follow-up chips
│ │ ├── dashboard/ ← StockHeader, FundamentalsGrid, PriceChart, ...
│ │ └── ui/ ← shadcn/ui + StockLogo + TickerTape
│ ├── lib
│ │ ├── api.ts ← Typed fetch client
│ │ └── streaming.ts ← useSSE hook, ChatMessage type
│ └── Dockerfile
└── tests
├── backend
│ ├── unit/ ← fusion signals, guardrails, intent classifier
│ ├── integration/ ← full pipeline
│ └── stress/ ← 33 queries × 2 modes
└── frontend/__tests__/
Finora ships a comprehensive test suite covering unit tests, integration tests, and stress testing. For details on running tests, see tests/README.md.
Test coverage:
- Backend unit — Deterministic logic (fusion signals, intent classifier, guardrails) — runs offline, < 1s
- Backend integration — Full pipeline with Groq mocked
- Frontend unit — 16 tests (ChatMessage, SSE parsing, UserMode type)
- Stress tests — 33 queries × 2 modes → 12 behavioral categories against live backend
Finora RAG pipeline is evaluated offline using RAGAS. Results visible at /eval page in production UI.
Current status: All 4 metrics PASS ✓
| Metric | Score | Target | Status |
|---|---|---|---|
| Faithfulness | 0.94 | 0.85 | ✓ |
| Answer Relevancy | 0.86 | 0.80 | ✓ |
| Context Recall | 0.98 | 0.75 | ✓ |
| Context Precision | 0.99 | 0.70 | ✓ |
Evaluated across: AAPL, RELIANCE, INFY, META — 12 synthetic QA pairs generated from Qdrant chunks.
For how to run custom RAGAS evals, see tests/README.md.
Python 3.11+, Node.js 20+
git clone https://github.com/charan-s108/Finora.git
cd Finora
cp finora-backend/.env.example finora-backend/.env
# Fill in: GROQ_API_KEY, QDRANT_URL, QDRANT_API_KEY, LANGCHAIN_API_KEYcd finora-backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Build stock universe (required first — ~555 stocks → stocks.json)
python backend/scripts/build_universe.py
# Seed historical RAG data (~30 min for full list)
python backend/scripts/ingest_historical.py --tickers AAPL MSFT NVDA RELIANCE TCS INFY --years 20
# Seed news corpus
python backend/scripts/ingest_news.py --tickers AAPL MSFT NVDA RELIANCE TCS INFY
# Seed filings
python backend/scripts/ingest_filings.py --tickers AAPL MSFT NVDA RELIANCE TCS INFY
# Start backend
uvicorn backend.main:app --reload --port 7860cd finora-frontend
npm install
echo "NEXT_PUBLIC_BACKEND_URL=http://localhost:7860" > .env.local
npm run devdocker-compose up --build- Create a new Docker Space for
finora-backend. - Add this to the top of the backend
README.mdin the Space repo:
---
title: finora-backend
emoji: 🚀
colorFrom: blue
colorTo: gray
sdk: docker
app_port: 7860
---- Make sure your backend container starts on
0.0.0.0:7860.
uvicorn backend.main:app --host 0.0.0.0 --port 7860- Set Hugging Face Space variables/secrets for:
GROQ_API_KEY,GROQ_MODEL_PRIMARY,GROQ_MODEL_FASTLANGCHAIN_API_KEY,LANGCHAIN_PROJECT,LANGCHAIN_TRACING_V2QDRANT_URL,QDRANT_API_KEYGUARDRAILS_ENABLED,DISCLAIMER_LOCALE,CORS_ORIGINS,ENV
- After deployment, your Space URL will be something like:
https://finora-backend.hf.spacenpm install -g vercel
cd finora-frontend && vercel
vercel env add NEXT_PUBLIC_BACKEND_URL
vercel --prodcurl https://<your-space-name>.hf.space/api/health
# → {"status":"ok","qdrant":"connected","groq":"connected","langsmith":"configured","universe_size":553}All LangGraph runs are automatically traced when LANGCHAIN_TRACING_V2=true.
- View all traces:
https://smith.langchain.com/projects/finora-prod - Each chat response includes a "LangSmith trace ↗" link in the UI
- Traces show: intent classification → retrieval latency → reranking → fusion → generation
cd finora-backend
python backend/scripts/eval_rag.py --tickers AAPL MSFT RELIANCE.NS --n 5| Metric | Target |
|---|---|
| Faithfulness | > 0.85 |
| Answer Relevance | > 0.80 |
| Context Recall | > 0.75 |
| Context Precision | > 0.70 |
| Noise Sensitivity | < 0.15 |
{
"query": "Why did AAPL drop today?",
"ticker": "AAPL",
"conversation_history": [],
"session_id": "uuid",
"user_mode": "insight"
}Stream events:
guardrail → { "status": "allowed" | "blocked" }
intent → { "intents": ["real_time", "news"] }
retrieving → { "news_chunks": 8, "historical_chunks": 3, "realtime": true }
token → { "content": "Apple fell..." } (streamed line-by-line)
chart_data → { "ticker", "currency", "label", "bars": [...] }
finance_chart → { "ticker", "currency", "bars": [...] }
citation → { "sources": [{ "url", "title", "source", "time" }] }
suggestions→ { "questions": ["What's Apple's PE ratio?", ...] }
disclaimer → { "text": "⚠ For informational purposes only..." }
done → { "trace_id", "confidence", "langsmith_url", "cached" }
Fuzzy search across 555 stocks. Returns ticker, name, exchange, sector, country.
Full snapshot: price, fundamentals, analyst consensus, 7-day OHLCV, live news, historical RAG signals.
OHLCV bars for any timeframe: 1D | 1W | 1M | 3M | 6M | 1Y | 3Y | 5Y | ALL
Real connectivity checks — Groq (1-token ping), Qdrant (list collections).
| Intent | INSIGHT | TRADER |
|---|---|---|
direct_buy_sell_recommendation |
Blocked → redirect with signal context | Allowed → signals + risk framing |
personal_financial_planning |
Blocked → redirect | Allowed |
insider_trading_context |
Blocked | Blocked |
market_manipulation |
Blocked | Blocked |
tax_evasion_advice |
Blocked | Blocked |
specific_options_strategy |
Blocked | Blocked |
The input classifier (llama-3.1-8b-instant) is calibrated to avoid over-blocking. "Should I invest in X?" and "Is X a good long-term investment?" are classified as fundamental_analysis (allowed), not as direct_buy_sell_recommendation. Only queries that combine a personal position with a decision request are blocked.
- Hallucination check — every number/% in the response is extracted and verified against
fused_context. Tolerances: 3 percentage points for rates/yields; 15% relative for dollar amounts. Zero-values are always treated as valid financial facts. Unverified sentences are stripped. If stripping would remove > 70% of the response, the original is returned instead of an empty answer. - PII scrub — Aadhaar (XXXX XXXX XXXX), PAN card (ABCDE1234F), and 10–18-digit account numbers are redacted before the response reaches the client.
- Directional language detection — if the response contains phrases like "will rise", "you should buy", or "guaranteed", the SEBI/SEC disclaimer is appended. On non-directional responses, the disclaimer fires once per session via a separate SSE event.
- Units enforcement —
UNITS MANDATORYrule in every system prompt requires$B,%,x(for multiples) notation on all numerical values. Bare numbers without context are flagged.
cd finora-backend && python -m mcp.server| Tool | Description |
|---|---|
get_realtime_quote(ticker) |
Live price, volume, intraday OHLC |
search_historical_rag(ticker, query, years) |
20yr OHLCV event chunks, reranked |
search_news_rag(ticker, query, days) |
News + filings, reranked, deduplicated |
get_fundamentals(ticker) |
PE, EPS, margins, analyst consensus |
screen_stocks(sector, min_pe, max_pe, country) |
Filter 555-stock universe |
get_stock_universe(query, limit) |
Fuzzy search by name or ticker |
MIT — see LICENSE
Built by Charan · Powered by Groq · Traced by LangSmith
"AI isn't a feature — it's the product."
