Skip to content

Nion9/Insightflow_ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ₯ InsightFlow AI

Python Streamlit License

Transform YouTube videos into searchable knowledge with AI-powered transcription and semantic search

InsightFlow AI is an intelligent video processing system that extracts audio from YouTube videos, transcribes content using OpenAI's Whisper, and creates a searchable question-answering system using LangChain's RAG (Retrieval-Augmented Generation) capabilities.

InsightFlow AI Demo

✨ Features

  • 🎯 AI-Powered Transcription: Automatically transcribe YouTube videos using OpenAI's Whisper model
  • 🧠 Semantic Search: Query video content using natural language questions
  • πŸ’¬ Interactive Q&A: Ask specific questions and get relevant answers from the video content
  • πŸ“Š Full Transcript Access: View and download complete transcriptions
  • πŸ” Vector Database: Leverages ChromaDB for efficient semantic search
  • πŸš€ Modern UI: Clean, responsive interface built with Streamlit

πŸ› οΈ Tech Stack

  • Frontend: Streamlit
  • Transcription: OpenAI Whisper
  • Vector Database: ChromaDB
  • Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
  • LLM Framework: LangChain
  • Video Processing: yt-dlp, FFmpeg

πŸ“‹ Prerequisites

Before you begin, ensure you have the following installed:

  • Python 3.8 or higher
  • FFmpeg (Download here)
  • pip (Python package manager)

πŸš€ Installation

1. Clone the Repository

git clone https://github.com/yourusername/insightflow-ai.git
cd insightflow-ai

2. Create Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Install FFmpeg

Windows:

  1. Download FFmpeg from ffmpeg.org
  2. Extract to a directory (e.g., D:\ffmpeg\bin)
  3. Add to System PATH or update the path in processor.py

macOS:

brew install ffmpeg

Linux:

sudo apt update
sudo apt install ffmpeg

πŸ’» Usage

Running the Application

streamlit run app.py

The application will open in your default browser at http://localhost:8501

Using InsightFlow AI

  1. Paste YouTube URL: Enter the URL of the YouTube video you want to analyze
  2. Click "Analyze Video": Wait for the processing to complete (1-3 minutes depending on video length)
  3. Ask Questions: Once processing is complete, ask questions about the video content
  4. View Transcript: Expand the transcript section to see the full text

Example Questions

  • "What is the main topic of this video?"
  • "Can you summarize the key points discussed?"
  • "What does the speaker say about [specific topic]?"
  • "What are the recommendations mentioned?"

πŸ“ Project Structure

insightflow-ai/
β”‚
β”œβ”€β”€ app.py                 # Main Streamlit application
β”œβ”€β”€ processor.py           # Video download and transcription logic
β”œβ”€β”€ brain.py              # Vector database and RAG implementation
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ README.md            # Project documentation
β”‚
β”œβ”€β”€ .vscode/
β”‚   └── launch.json      # VS Code debug configuration
β”‚
β”œβ”€β”€ venv/                # Virtual environment (not tracked)
β”œβ”€β”€ chroma_db/           # Vector database storage (generated)
└── temp_audio.mp3       # Temporary audio files (generated)

πŸ”§ Configuration

FFmpeg Path (Windows Users)

If FFmpeg is not in your system PATH, update the path in processor.py:

os.environ["PATH"] += os.pathsep + r"YOUR_FFMPEG_PATH\bin"

Whisper Model Selection

You can change the Whisper model for different accuracy/speed tradeoffs in processor.py:

# Options: tiny, base, small, medium, large
model = whisper.load_model("base")  # Change "base" to your preferred model
Model Speed Accuracy Use Case
tiny ⚑⚑⚑ ⭐⭐ Quick testing
base ⚑⚑ ⭐⭐⭐ Default, balanced
small ⚑ ⭐⭐⭐⭐ Better accuracy
medium 🐌 ⭐⭐⭐⭐⭐ High accuracy
large 🐌🐌 ⭐⭐⭐⭐⭐ Best accuracy

πŸ§ͺ Development

Running in Debug Mode (VS Code)

  1. Open app.py in VS Code
  2. Press F5 or click "Run and Debug"
  3. Select "Python: Streamlit" configuration

Testing Individual Components

Test Processor:

python processor.py

Test Brain (Vector DB):

python brain.py

πŸ“¦ Dependencies

yt-dlp                    # YouTube video downloader
openai-whisper           # Audio transcription
langchain-text-splitters # Text chunking
langchain-community      # LangChain integrations
langchain-core           # LangChain core functionality
chromadb                 # Vector database
sentence-transformers    # Text embeddings
torch                    # PyTorch for ML models
streamlit                # Web interface

🎯 Use Cases

  • πŸ“š Educational Content: Extract key information from lectures and tutorials
  • πŸŽ™οΈ Podcast Analysis: Search through podcast episodes for specific topics
  • πŸ“Ί Video Research: Quickly find relevant sections in long-form content
  • πŸ“ Meeting Recordings: Create searchable transcripts of recorded meetings
  • 🎬 Content Creation: Analyze competitor videos or research topics

πŸ›£οΈ Roadmap

  • Support for multiple video sources (Vimeo, local files)
  • Multi-language support
  • Export functionality (PDF, DOCX)
  • Timestamp-based search results
  • Video player integration with auto-jump to relevant sections
  • Batch processing for multiple videos
  • Advanced analytics dashboard

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ‘€ Author

Minhajul Islam Nion

πŸ“§ Contact

For questions or feedback, please reach out via email or open an issue on GitHub.


Built for recruiters and AI enthusiasts

About

AI-powered video intelligence platform that transcribes YouTube videos using Whisper, performs sentiment analysis, and enables semantic search through RAG-based Q&A system built with LangChain and ChromaDB.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages