Skip to content

RoyFerguson0/deep-learning-audio

Repository files navigation

Deep Learning Audio Project

A practical audio processing system that uses AI to understand and work with speech and sounds.

What This Project Does

This project brings together several audio AI capabilities:

  • Speaker Recognition — Identify who's speaking by analyzing their voice characteristics
  • Speech Transcription — Convert spoken words into text
  • Language Detection — Identify which language is being spoken
  • Voice Activity Detection — Detect when someone is actually talking (vs. silence or noise)
  • Sound Classification — Identify different types of sounds and audio events
  • Text-to-Speech — Generate natural-sounding speech from text
  • Speaker Verification — Check if a voice matches a known speaker (like voice biometrics)

How It Works

The project uses pre-trained AI models from SpeechBrain and Whisper. It can:

  1. Record audio from your microphone
  2. Analyze the audio to extract information about the speaker and speech
  3. Store speaker profiles by creating "embeddings" (mathematical representations of a person's voice)
  4. Compare new audio against stored profiles to verify identity

Key Files

  • speech_verification_demo.py — Interactive demo to verify who's speaking
  • speech_full_system_optimized.py — Complete audio analysis pipeline
  • speech_vad.py — Detects when speech is present
  • speech_language_identification.py — Identifies spoken language
  • speech_sound_classification.py — Classifies different sounds
  • speech_tts.py — Generates speech from text

Setup

The project uses Python with deep learning libraries (PyTorch) and pre-trained models stored in pretrained_models/. Audio samples and embeddings are organized by speaker in the embeddings/ and transcriptions/ folders.

Use Cases

  • Voice authentication systems
  • Automatic speech recognition
  • Audio analysis and categorization
  • Voice biometrics applications

About

A practical audio processing system that uses AI to understand and work with speech and sounds.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages