Skip to content

aiKunalBisht/Transcript-ai

Repository files navigation

title TranscriptAI
emoji πŸŽ™οΈ
colorFrom pink
colorTo yellow
sdk streamlit
sdk_version 1.55.0
app_file app.py
pinned true
license mit
short_description Speech & Meeting Intelligence β€” English Β· Hindi Β· Japanese

TranscriptAI

Meeting intelligence that understands not just what was said β€” but what was meant.

Live Demo GitHub Eval Score License MIT

Trilingual Β· English Β· Hindi Β· Japanese


The Problem

Most meeting tools extract what was said. They miss everything underneath.

Every language and culture has indirect communication patterns β€” polite rejections, soft commitments, face-saving agreements β€” that a generic summarizer will log as action items that never get done.

TranscriptAI is built to catch exactly those signals.


Live Demo

β†’ Try it on Hugging Face

No setup. No API key. Paste any transcript and get structured intelligence in seconds.

Example β€” what a generic tool misses:

What was said Generic AI output TranscriptAI output
Indirect verbal agreement βœ… Action item logged ⚠️ Soft commitment β€” low follow-through probability
Japanese polite consideration phrase βœ… Action item logged πŸ”΄ 72% rejection confidence β€” request written confirmation
Corporate hedge β€” "we'll circle back" πŸ“ Meeting note πŸŒ€ No concrete next step β€” escalation recommended
Enthusiastic but hierarchical yes βœ… Agreement confirmed 🟠 Agreeing to please, not necessarily to act

Output

For every transcript, TranscriptAI produces:

  • Summary β€” concise narrative paragraph plus key bullet points scaled to meeting length
  • Action items β€” extracted with owner, deadline, and commitment strength rating
  • Communication risk signals β€” indirect rejections, hedging language, power imbalance markers
  • Speaker tone profile β€” 6-level colour-coded scale with intensity score per speaker
  • Meeting health score β€” 0 to 100 composite across sentiment, action clarity, risk, and AI confidence
  • Session trends β€” risk drift, hallucination rate, and workload patterns across meetings

Language Engines

Three independent NLP modules, auto-detected from transcript content.

English

Commitment strength grading distinguishes "I will deliver" from "I will try" from "we will see." Detects escalation signals, power imbalance language, passive aggression, and corporate hedging. Over 40 patterns across 4 categories.

Hindi

Identifies indirect refusals, hierarchical agreement (saying yes to please rather than commit), face-saving exits, and vague reassurances. Handles both Roman script and Devanagari. Over 30 patterns.

Japanese

16 nemawashi soft-rejection patterns with per-pattern confidence scores. Keigo formality detection via MeCab morphological analysis. Cross-script speaker normalization β€” the same person written in kanji and in romanization resolves to a single speaker identity.


Architecture

transcription/
  pii_masker.py           Local anonymization β€” runs before any LLM call
  speaker_normalizer.py   Cross-script speaker identity resolution
  audio_processor.py      Whisper transcription pipeline

analysis/
  analyzer.py             LLM orchestration β€” Groq β†’ Ollama β†’ Mock fallback
  english_analyzer.py     English NLP engine
  hindi_analyzer.py       Hindi NLP engine
  soft_rejection.py       Japanese nemawashi detector
  hallucination_guard.py  Rule-based output verification
  japanese_tokenizer.py   MeCab morphological analysis

utils/
  evaluator.py            ROUGE-L + F1 + semantic similarity scoring
  cache.py                MD5 result caching β€” 24h TTL
  logger.py               JSONL observability and trend analysis

app.py                    Streamlit UI β€” 7 tabs, health score, trend dashboard
api.py                    FastAPI REST endpoints

Processing pipeline β€” order is strict:

1. PII Mask       local, before LLM          (privacy compliance)
2. LLM Analysis   Groq / Ollama / Mock
3. PII Restore    local, before normalization
4. Normalize      cross-script speaker deduplication
5. Tone Classify  per-speaker 6-level scoring
6. NLP Layer      language-specific signal detection
7. Cache + Log    MD5 cache write, JSONL append

Evaluation

Standard NLP metrics carry Western assumptions. Formal neutral speech in Japanese or indirect communication in South Asian business contexts scores poorly on metrics calibrated for direct English. This project uses a custom evaluation framework with cultural corrections applied at each version iteration.

Version Score Primary Change
v1 30% Baseline β€” exact string matching
v2 55% Fuzzy matching, semantic similarity
v3 75% Cultural ground truth, Japanese tokenization
v4 83% Hallucination guard, soft rejection filter
v5 93% Tone intelligence, optimal bullet assignment
Metric Result
Action Item F1 1.0 β€” Excellent
Sentiment (cultural) 1.0 β€” Excellent
Hallucination Risk Low
Overall 93%

Production Features

Privacy PII anonymization runs locally before any transcript reaches an LLM. Names, phone numbers, and email addresses are masked on input and restored on output. No personal data is transmitted.

Reliability Three-tier LLM fallback β€” Groq (1–2s, free tier) β†’ Ollama (local, zero cost) β†’ Mock (always available). MD5 result caching with 24-hour TTL means repeat queries return in under one second.

Observability Every analysis is written to a local JSONL log. A built-in trends dashboard tracks soft rejection rates, hallucination drift, and workload distribution across sessions.

Integration FastAPI REST endpoint at /analyze for direct integration with CRM systems, Slack bots, or downstream pipelines.


Quick Start

git clone https://github.com/aiKunalBisht/Transcript-ai.git
cd Transcript-ai
pip install -r requirements.txt

Cloud β€” Groq (recommended, free tier)

export GROQ_API_KEY=your_key_here    # console.groq.com
python -m streamlit run app.py

Local β€” fully offline, zero data leaves your machine

ollama pull qwen3:8b
python -m streamlit run app.py

Optional dependencies

pip install fugashi unidic-lite        # MeCab Japanese tokenizer
pip install scikit-learn               # TF-IDF semantic similarity
pip install sentence-transformers      # Neural semantic scoring

REST API

python api.py
# Interactive docs at http://localhost:8000/docs
import requests

response = requests.post("http://localhost:8000/analyze", json={
    "transcript": "Alex: Can we get this delivered by Friday?\nJordan: We will see what we can do.",
    "language": "en",
    "mask_pii": True
})

result = response.json()["result"]
print(result["soft_rejections"]["risk_level"])    # HIGH
print(result["soft_rejections"]["risk_summary"])  # Commitment unlikely to be followed through

Known Limitations

Limitation Planned Improvement
Speaker diarization ~70% accuracy pyannote.audio integration
Audio upload unavailable on HF Spaces Groq Whisper API β€” next release
Confidence scores are heuristic Labeled dataset and calibration
Demo uses synthetic test cases Real-world transcript validation ongoing

Project Scale

19 Python files Β· 6,000+ lines Β· 90+ functions 86 linguistic patterns across 3 languages Β· 500+ Japanese surname entries Supported formats: TXT Β· VTT Β· JSON Β· MP4 Β· MP3 Β· WAV Β· M4A


Built by Kunal Bisht β€” Pithoragarh, India

Hugging Face Β· LinkedIn Β· GitHub

About

In modern international business, conversations often shift between languages. Transcript AI bridges this gap by utilizing Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to ensure no context is lost in translation. It doesn't just transcribe; it understands business intent.

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors