Code to train a custom time-domain autoencoder to dereverb audio
-
Updated
Nov 30, 2023 - Python
Code to train a custom time-domain autoencoder to dereverb audio
Dual-model speech AI toolkit for speaker verification and speaker-aware diarization, with streaming inference, meeting analysis, long-audio monitoring, and speaker-bank integration.
Real-time speech enhancement pipeline — custom-trained U-Net denoising model, ONNX inference, Overlap-Add synthesis, and virtual audio routing for Teams, Zoom, and DAW use. CPU-only, no cloud dependency.
A custom MCP server that separates a YouTube track into stems (vocals, drums, bass) and extracts a sonic signature: BPM, musical key, stereo width, transient punch, and a 512-dim CLAP semantic embedding. Runs locally on CPU via Demucs and librosa.
Engine identification using acoustic signal analysis and machine learning to classify 8 vehicle types. Audio signals are processed using FFT and feature extraction, and a multi-class model predicts vehicle categories based on their unique sound patterns.
Machine learning system for music genre classification using feature engineering, stratified evaluation, SVC/XGBoost modeling, and reproducible prediction export.
Automated audio/video ML pipeline for detecting and transcribing jazz solos from live recordings. Runs nightly against Smalls Jazz Club archives: uses CLAP (instrument detection), Demucs (source separation), CLIP (performer identification), and basic-pitch (MIDI transcription). Results served via REST API.
ML-based speech emotion recognition system that analyzes audio features to classify emotions with a simple interface for testing.
Key Features: Simple VAE architecture with encoder/decoder Synthetic music data generation for training Interactive training with progress tracking Music generation from latent space sampling Audio conversion and playback Downloadable audio files
Neural TTS and voice-cloning application using XTTS/VITS. Supports 3–30 s reference audio for speaker adaptation, real-time pitch/speed control, and WAV/MP3 export.
AI-generated audio summarisation pipeline — Whisper transcription, LLM key-insight extraction, and structured spoken summaries with TTS playback and Streamlit interface.
Audio file processing pipeline with GPT-4-powered error diagnosis — detects codec issues, sample rate mismatches, and corruption artefacts with automated remediation suggestions.
Music harmony AI — chord progression analysis with Roman numeral labelling, voice leading checker, style-conditioned progression generation (Baroque/Jazz/Pop), and MIDI export via music21.
Audio analysis in javascript/typescript
Add a description, image, and links to the audio-ml topic page so that developers can more easily learn about it.
To associate your repository with the audio-ml topic, visit your repo's landing page and select "manage topics."