🎧 ShopNow Voice Agent

AI-Powered Real-Time Customer Support Voice Agent

Real-time Voice · Multilingual · Sentiment-Aware · Smart Escalation

Features · Architecture · Setup · Demo · API · Dashboard

🎯 The Problem

ShopNow is a D2C brand processing 40,000+ orders/month across India with 35 human agents available only 9 AM–9 PM.

Metric	Before AI
⏱ Average wait time	8 minutes
✅ First-contact resolution	52%
⭐ CSAT score	3.1 / 5
🌙 After-hours support	❌ None
🌐 Multilingual support	❌ None

Most incoming calls are repetitive Tier-1 queries — order status, returns, payments, delivery — that require no human judgment but consume the bulk of agent time.

💡 The Solution

Meet Priya — ShopNow's AI voice support agent.

A real-time voice system that listens, understands context across multiple turns, detects when a customer is frustrated, and knows exactly when to step aside for a human.

Metric	With ShopNow Voice Agent
⏱ Wait time	< 2 seconds
✅ FCR (Tier-1)	75%+ projected
⭐ CSAT	4.0+ projected
🕐 Availability	24/7
🌐 Languages	English, Hindi, Hinglish, Urdu, Urdilish
💰 Cost reduction	60–70% projected

✨ Features

Feature	Description
🎙️ Real-Time Voice	Bidirectional audio over WebSocket — customer speaks, Priya responds in < 2 seconds
🌐 Multilingual	Auto-detects language — no "press 1 for Hindi"
🧠 5-Intent NLU	OpenAI function calling classifies intent + extracts entities in one API call
💾 Multi-Turn Memory	In-memory session tracks full conversation, intent, sentiment, order context
📚 RAG Knowledge Base	LangChain + FAISS over 5 policy documents — grounded, not hallucinated
😤 Hybrid Sentiment	Text (GPT-4o-mini) + Voice tone (librosa acoustic features) combined
🚨 Smart Escalation	Multi-signal escalation engine with structured human handoff brief
📊 Live Dashboard	Streamlit ops dashboard — FCR, escalations, sentiment trends, intent breakdown
📝 Call Summaries	LLM-generated call summaries logged after every session

🏗 Architecture

Customer Browser
      │
      │  WebSocket (real-time audio)
      ▼
┌──────────────────────────────────────────────┐
│               FastAPI Backend                │
│                                              │
│  ┌────────────┐      ┌──────────────────┐    │
│  │ Sarvam AI  │      │  Session Memory  │    │
│  │  STT       │─────▶│  (call_id keyed) │    │
│  └────────────┘      └────────┬─────────┘    │
│                               │              │
│                  ┌────────────▼───────────┐  │
│                  │   Intent Classifier    │  │
│                  │  (OpenAI func calling) │  │
│                  └────────────┬───────────┘  │
│                               │              │
│              ┌────────────────┴───────────┐  │
│              ▼                            ▼  │
│     ┌──────────────┐         ┌──────────────┐│
│     │  SQLite DB   │         │  FAISS RAG   ││
│     │  (Orders)    │         │  (5 docs)    ││
│     └──────┬───────┘         └──────┬───────┘│
│            └──────────┬─────────────┘        │
│                       ▼                      │
│        ┌──────────────────────────────┐      │
│        │   Hybrid Sentiment Analysis  │      │
│        │  Text (GPT) + Voice (librosa)│      │
│        └──────────────┬───────────────┘      │
│                       │                      │
│             ┌─────────┴──────────┐           │
│             ▼                    ▼           │
│       ┌──────────┐      ┌──────────────┐     │
│       │ Resolve  │      │  Escalate +  │     │
│       │ + Log    │      │    Brief     │     │
│       └────┬─────┘      └──────┬───────┘     │
│            └──────────┬────────┘             │
│                       ▼                      │
│            ┌────────────────────┐            │
│            │  Sarvam AI TTS     │            │
│            └────────────────────┘            │
└──────────────────────────────────────────────┘
      │
      │  Audio response
      ▼
Customer hears Priya

🛠 Tech Stack

Layer	Technology	Purpose
Backend	FastAPI + Uvicorn	REST API + WebSocket server
STT	Sarvam AI Bulbul	Indian language speech-to-text
TTS	Sarvam AI Bulbul v2	Natural Indian voice synthesis
LLM	OpenAI GPT-4o-mini	Intent classification + response generation
Embeddings	OpenAI text-embedding-3-small	Document vectorization for RAG
Voice Sentiment	librosa	Acoustic emotion detection
Text Sentiment	GPT-4o-mini	Utterance-level sentiment scoring
Vector Store	FAISS (CPU)	Policy document semantic retrieval
Database	SQLite + SQLAlchemy async	Orders, call logs, escalation records
Frontend	Streamlit + Plotly	Live dashboard + voice call interface
Voice UI	Embedded HTML/JS	Real-time browser mic + audio playback
Session	Python in-memory dict	Per-call multi-turn conversation state
Logging	Loguru	Structured application logging

📁 Project Structure

shopNow-voice-agent/
│
├── backend/
│   ├── main.py                     # App entry point + lifespan
│   ├── config.py                   # Settings + env vars
│   ├── routes/
│   │   ├── call.py                 # POST /call/start /turn /end
│   │   ├── transcribe.py           # POST /transcribe  (STT)
│   │   ├── speak.py                # POST /speak       (TTS)
│   │   ├── report.py               # GET  /report/daily /escalation
│   │   └── websocket.py            # WS   /ws/{call_id}
│   ├── services/
│   │   ├── stt.py                  # Sarvam AI STT integration
│   │   ├── tts.py                  # Sarvam AI TTS integration
│   │   ├── intent.py               # OpenAI function calling classifier
│   │   ├── llm.py                  # LangChain conversation chain
│   │   ├── rag.py                  # FAISS retrieval logic
│   │   ├── sentiment.py            # Text sentiment (GPT-4o-mini)
│   │   ├── voice_sentiment.py      # Voice sentiment (librosa)
│   │   ├── sentiment_detection.py  # Hybrid fusion logic
│   │   └── escalation.py          # Multi-signal escalation engine
│   ├── handlers/
│   │   ├── order_status.py         # Order status DB handler
│   │   ├── returns.py              # Return/refund DB handler
│   │   ├── payment.py              # Payment issue DB handler
│   │   ├── delivery.py             # Delivery complaint handler
│   │   └── product.py              # Product query handler
│   ├── memory/
│   │   └── session.py              # In-memory session manager
│   ├── db/
│   │   ├── database.py             # Async SQLite connection
│   │   ├── models.py               # ORM models
│   │   └── seed.py                 # DB initialization + data load
│   └── utils/                      # Utility helpers
│
├── frontend/
│   ├── app.py                      # Streamlit entry point + navigation
│   ├── index.html                  # Embedded HTML/JS voice call UI
│   └── pages/
│       ├── dashboard.py            # Live ops dashboard
│       ├── escalations.py          # Escalation brief viewer
│       ├── report.py               # Daily ops report
│       └── test_agent.py           # Real-time voice call interface
│
├── rag_store/
│   ├── documents/
│   │   ├── cancellation.txt        # Cancellation policy
│   │   ├── return_policy.txt       # Return + refund policy
│   │   ├── shipping_faq.txt        # Delivery + tracking FAQ
│   │   ├── payment_faq.txt         # Payment methods + issues
│   │   └── product_info.txt        # Product catalogue info
│   └── index/                      # FAISS vector index (auto-generated)
│
├── data/
│   └── Orderlist.csv               # Order dataset
│
├── scripts/
│   └── build_rag.py                # One-time FAISS index builder
│
├── .env.example                    # Environment variables template
└── requirements.txt

🚀 Setup

Prerequisites

Python 3.10+
OpenAI API key — get one here
Sarvam AI API key — get one here

Step 1 — Clone

git clone https://github.com/DeveloperVishal004/shopNow-voice-agent.git
cd shopNow-voice-agent

Step 2 — Virtual environment

# macOS / Linux
python3 -m venv .venv
source .venv/bin/activate

# Windows
python -m venv .venv
.venv\Scripts\activate

Step 3 — Install dependencies

pip install -r requirements.txt

Step 4 — Configure environment

cp .env.example .env

Open .env and fill in your keys:

openai_api_key=your_openai_api_key_here
sarvam_api_key=your_sarvam_api_key_here
DATABASE_URL=sqlite:///./shopnow.db
FAISS_INDEX_PATH=./rag_store/index/faiss.index
LLM_MODEL=gpt-4o-mini
EMBEDDING_MODEL=text-embedding-3-small
MAX_TOKENS=200
TTS_MODEL=tts-1
TTS_VOICE=nova
ESCALATION_NEGATIVE_TURNS=4
ESCALATION_SENTIMENT_THRESHOLD=-0.7
ESCALATION_MIN_TURNS=3

Step 5 — Seed the database

python backend/db/seed.py

Step 6 — Build RAG index

python scripts/build_rag.py

Step 7 — Run

Terminal 1 — Backend:

uvicorn backend.main:app --reload

Terminal 2 — Frontend:

streamlit run frontend/app.py

Open http://localhost:8501 in your browser.

📖 Interactive API docs at http://localhost:8000/docs

🎬 Demo

Try these scenarios on the Test Agent page:

Scenario 1 — English (order status):

"Hi, where is my order ORD-1001?"

Scenario 2 — Hindi (return request):

"Mera order wapas karna hai, item damage ho gaya tha"

Scenario 3 — Hinglish (payment issue):

"Mera payment do baar cut gaya, ek refund karo"

Scenario 4 — Escalation trigger:

"This is absolutely ridiculous, I want to speak to a manager right now"

📡 API Endpoints

Method	Endpoint	Description
`GET`	`/health`	Server health check
`POST`	`/call/start`	Create new call session → returns `call_id`
`POST`	`/call/turn`	Process one customer utterance → agent response
`POST`	`/call/end`	End call, log to database, clear session
`GET`	`/call/session/{id}`	Debug — view full live session state
`POST`	`/transcribe/`	STT — upload audio → transcribed text
`POST`	`/speak/`	TTS — text + language → mp3 audio
`GET`	`/report/daily`	Aggregated call stats for dashboard
`GET`	`/report/escalation/{id}`	Full handoff brief for human agent
`WS`	`/ws/{call_id}`	Real-time bidirectional voice stream

Example Requests

Start a call:

POST /call/start
{ "customer_phone": "+919876543210" }

Process a turn:

POST /call/turn
{
  "call_id": "your-call-id",
  "text": "Where is my order ORD-1001?",
  "language": "en"
}

📊 Dashboard

The Streamlit dashboard at http://localhost:8501 has 4 pages:

Live Dashboard

Total calls handled
AI resolution rate + FCR gauge vs 52% baseline
Escalation count
Average sentiment score
Calls by intent (bar chart)
Language breakdown (pie chart)

Escalations

Lookup any escalated call by ID and view:

Customer info + detected intent
Recommended tone for human agent
Sentiment history trend chart
Last 6 turns of conversation
Order context from database

Daily Report

Summary stats for support leadership
Calls by intent table
Resolution vs escalation breakdown

Test Agent

Real-time voice call interface
Type or speak to Priya
See live transcript with sentiment + intent labels

🚨 Escalation Logic

The escalation engine evaluates multiple signals every turn:

Rule 1: Explicit human request    → immediate escalation
        "manager", "agent", "human", "manav bulao"

Rule 2: Consecutive negative      → N consecutive negative/angry turns
        turns                       (configured via ESCALATION_NEGATIVE_TURNS)

Rule 3: Sentiment threshold       → avg score ≤ ESCALATION_SENTIMENT_THRESHOLD
                                    evaluated after ESCALATION_MIN_TURNS

Rule 4: Unresolved call length    → 8+ customer turns without resolution

Handoff brief includes:

✓ Customer name + phone
✓ Detected intent + issue summary
✓ Full sentiment history with label counts
✓ Last 6 turns of conversation
✓ Recommended tone (empathetic / professional / urgent)
✓ Order context from database

🧬 Hybrid Sentiment Detection

Standard text sentiment misses frustrated customers who use polite words with an angry tone.

Every utterance
      │
      ▼
 Text Sentiment (GPT-4o-mini)
      │
 ┌────┴────┐
 │         │
Non-neutral  Neutral
 │         │
 │         ▼
 │   Voice Sentiment (librosa)
 │   - RMS Energy     (loudness)
 │   - ZCR            (vocal tension)
 │   - Pitch Variance (emotional stability)
 │   - Tempo          (speaking rate)
 │         │
 └────┬────┘
      ▼
 Final Sentiment
 (positive / neutral / negative / angry)

Voice analysis only runs when text is neutral — optimizing latency while catching tone-based frustration.

🌐 Language Support

Code	Language	Script
`en`	English	Latin
`hi`	Hindi	Devanagari (U+0900–U+097F)
`hinglish`	Hinglish	Roman (Hindi words)
`ur`	Urdu	Perso-Arabic (U+0600–U+06FF)
`urdilish`	Urdilish	Roman (Urdu words)

📋 Supported Customer Intents

Intent	Description
`order_status`	Where is my order, when will it arrive
`return_refund`	I want to return an item, refund status
`payment_issue`	Payment failed, double charge, missing refund
`delivery_complaint`	Late delivery, damaged in transit, wrong address
`product_query`	Product details, availability, warranty, authenticity

⚠️ Current Limitations

Sessions are stored in memory — not suitable for multi-server deployment
No authentication or authorization layer
Multilingual behavior relies on prompt design and can be extended
External API retry handling can be improved
Reporting can be extended with SLA metrics and trend analysis

🔮 What's Next

Tamil, Telugu, Kannada, Bengali full support
Redis-based durable session storage
Twilio / Exotel integration for real phone numbers
CRM and ticketing platform integrations
Post-call satisfaction capture
Proactive outbound calls for at-risk orders
Skill-based escalation routing by language and issue type

🤝 Contributing

# Create your feature branch
git checkout -b feature/your-feature

# Commit with a clear prefix
git commit -m "[FEATURE] Add Tamil language support"

# Push and open a PR against main
git push origin feature/your-feature

Commit prefixes: [FEATURE] [BUGFIX] [PERF] [DOCS]

📄 License

MIT License — see LICENSE for details.

Built with ❤️

⭐ Star this repo if you found it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
data		data
frontend		frontend
rag_store		rag_store
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py

Folders and files

Latest commit

History

Repository files navigation

🎧 ShopNow Voice Agent

AI-Powered Real-Time Customer Support Voice Agent

🎯 The Problem

💡 The Solution

✨ Features

🏗 Architecture

🛠 Tech Stack

📁 Project Structure

🚀 Setup

Prerequisites

Step 1 — Clone

Step 2 — Virtual environment

Step 3 — Install dependencies

Step 4 — Configure environment

Step 5 — Seed the database

Step 6 — Build RAG index

Step 7 — Run

🎬 Demo

Try these scenarios on the Test Agent page:

📡 API Endpoints

Example Requests

📊 Dashboard

Live Dashboard

Escalations

Daily Report

Test Agent

🚨 Escalation Logic

🧬 Hybrid Sentiment Detection

🌐 Language Support

📋 Supported Customer Intents

⚠️ Current Limitations

🔮 What's Next

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages