| title | TranscriptAI |
|---|---|
| emoji | ποΈ |
| colorFrom | pink |
| colorTo | yellow |
| sdk | streamlit |
| sdk_version | 1.55.0 |
| app_file | app.py |
| pinned | true |
| license | mit |
| short_description | Speech & Meeting Intelligence β English Β· Hindi Β· Japanese |
Meeting intelligence that understands not just what was said β but what was meant.
Trilingual Β· English Β· Hindi Β· Japanese
Most meeting tools extract what was said. They miss everything underneath.
Every language and culture has indirect communication patterns β polite rejections, soft commitments, face-saving agreements β that a generic summarizer will log as action items that never get done.
TranscriptAI is built to catch exactly those signals.
No setup. No API key. Paste any transcript and get structured intelligence in seconds.
Example β what a generic tool misses:
| What was said | Generic AI output | TranscriptAI output |
|---|---|---|
| Indirect verbal agreement | β Action item logged | |
| Japanese polite consideration phrase | β Action item logged | π΄ 72% rejection confidence β request written confirmation |
| Corporate hedge β "we'll circle back" | π Meeting note | π No concrete next step β escalation recommended |
| Enthusiastic but hierarchical yes | β Agreement confirmed | π Agreeing to please, not necessarily to act |
For every transcript, TranscriptAI produces:
- Summary β concise narrative paragraph plus key bullet points scaled to meeting length
- Action items β extracted with owner, deadline, and commitment strength rating
- Communication risk signals β indirect rejections, hedging language, power imbalance markers
- Speaker tone profile β 6-level colour-coded scale with intensity score per speaker
- Meeting health score β 0 to 100 composite across sentiment, action clarity, risk, and AI confidence
- Session trends β risk drift, hallucination rate, and workload patterns across meetings
Three independent NLP modules, auto-detected from transcript content.
Commitment strength grading distinguishes "I will deliver" from "I will try" from "we will see." Detects escalation signals, power imbalance language, passive aggression, and corporate hedging. Over 40 patterns across 4 categories.
Identifies indirect refusals, hierarchical agreement (saying yes to please rather than commit), face-saving exits, and vague reassurances. Handles both Roman script and Devanagari. Over 30 patterns.
16 nemawashi soft-rejection patterns with per-pattern confidence scores. Keigo formality detection via MeCab morphological analysis. Cross-script speaker normalization β the same person written in kanji and in romanization resolves to a single speaker identity.
transcription/
pii_masker.py Local anonymization β runs before any LLM call
speaker_normalizer.py Cross-script speaker identity resolution
audio_processor.py Whisper transcription pipeline
analysis/
analyzer.py LLM orchestration β Groq β Ollama β Mock fallback
english_analyzer.py English NLP engine
hindi_analyzer.py Hindi NLP engine
soft_rejection.py Japanese nemawashi detector
hallucination_guard.py Rule-based output verification
japanese_tokenizer.py MeCab morphological analysis
utils/
evaluator.py ROUGE-L + F1 + semantic similarity scoring
cache.py MD5 result caching β 24h TTL
logger.py JSONL observability and trend analysis
app.py Streamlit UI β 7 tabs, health score, trend dashboard
api.py FastAPI REST endpoints
Processing pipeline β order is strict:
1. PII Mask local, before LLM (privacy compliance)
2. LLM Analysis Groq / Ollama / Mock
3. PII Restore local, before normalization
4. Normalize cross-script speaker deduplication
5. Tone Classify per-speaker 6-level scoring
6. NLP Layer language-specific signal detection
7. Cache + Log MD5 cache write, JSONL append
Standard NLP metrics carry Western assumptions. Formal neutral speech in Japanese or indirect communication in South Asian business contexts scores poorly on metrics calibrated for direct English. This project uses a custom evaluation framework with cultural corrections applied at each version iteration.
| Version | Score | Primary Change |
|---|---|---|
| v1 | 30% | Baseline β exact string matching |
| v2 | 55% | Fuzzy matching, semantic similarity |
| v3 | 75% | Cultural ground truth, Japanese tokenization |
| v4 | 83% | Hallucination guard, soft rejection filter |
| v5 | 93% | Tone intelligence, optimal bullet assignment |
| Metric | Result |
|---|---|
| Action Item F1 | 1.0 β Excellent |
| Sentiment (cultural) | 1.0 β Excellent |
| Hallucination Risk | Low |
| Overall | 93% |
Privacy PII anonymization runs locally before any transcript reaches an LLM. Names, phone numbers, and email addresses are masked on input and restored on output. No personal data is transmitted.
Reliability Three-tier LLM fallback β Groq (1β2s, free tier) β Ollama (local, zero cost) β Mock (always available). MD5 result caching with 24-hour TTL means repeat queries return in under one second.
Observability Every analysis is written to a local JSONL log. A built-in trends dashboard tracks soft rejection rates, hallucination drift, and workload distribution across sessions.
Integration
FastAPI REST endpoint at /analyze for direct integration with CRM systems, Slack bots, or downstream pipelines.
git clone https://github.com/aiKunalBisht/Transcript-ai.git
cd Transcript-ai
pip install -r requirements.txtCloud β Groq (recommended, free tier)
export GROQ_API_KEY=your_key_here # console.groq.com
python -m streamlit run app.pyLocal β fully offline, zero data leaves your machine
ollama pull qwen3:8b
python -m streamlit run app.pyOptional dependencies
pip install fugashi unidic-lite # MeCab Japanese tokenizer
pip install scikit-learn # TF-IDF semantic similarity
pip install sentence-transformers # Neural semantic scoringpython api.py
# Interactive docs at http://localhost:8000/docsimport requests
response = requests.post("http://localhost:8000/analyze", json={
"transcript": "Alex: Can we get this delivered by Friday?\nJordan: We will see what we can do.",
"language": "en",
"mask_pii": True
})
result = response.json()["result"]
print(result["soft_rejections"]["risk_level"]) # HIGH
print(result["soft_rejections"]["risk_summary"]) # Commitment unlikely to be followed through| Limitation | Planned Improvement |
|---|---|
| Speaker diarization ~70% accuracy | pyannote.audio integration |
| Audio upload unavailable on HF Spaces | Groq Whisper API β next release |
| Confidence scores are heuristic | Labeled dataset and calibration |
| Demo uses synthetic test cases | Real-world transcript validation ongoing |
19 Python files Β· 6,000+ lines Β· 90+ functions 86 linguistic patterns across 3 languages Β· 500+ Japanese surname entries Supported formats: TXT Β· VTT Β· JSON Β· MP4 Β· MP3 Β· WAV Β· M4A
Built by Kunal Bisht β Pithoragarh, India
Hugging Face Β· LinkedIn Β· GitHub