Search Β· Play Β· Feel Β· Discover β all through natural conversation
49 tools Β· 12 moods Β· 1 intelligent agent Β· β songs
Built with β€οΈ by Team StarkMind
Jitin Kumar Sengar Β· Sonia Β· Vishakha Gaur Β· Yashwant Giri
- Overview
- Key Features
- System Architecture
- Project Structure
- Tech Stack
- Getting Started
- API Reference
- Tool Reference (49 Tools)
- Mood Engine
- Agent Configuration
- Example Conversations
- How It Works
- Team
- License
SoniQue is a conversational AI music assistant that lets you search, play, organise, discover, and feel music through natural language. Unlike traditional music players, SoniQue understands context, detects your mood, fetches lyrics, manages queues, tracks listening habits, and learns your taste β all powered by a single ReAct agent with 49 tools.
No buttons to click. No menus to navigate. Just tell SoniQue what you want.
You: "I'm feeling sad, play something soothing"
SoniQue: Detects mood β Finds matching music β Plays it β Logs to history
One agent. One LLM call. Forty-nine tools. Zero friction.
|
Detects your emotion from text using keyword-weighted scoring across 12 mood categories. Automatically selects genre-matched music and plays it. Fetches plain-text and time-synced karaoke lyrics from lrclib.net and lyrics.ovh. Includes automatic mojibake repair for multilingual support (Hindi, Japanese, etc.). Full music queue with shuffle, repeat one/all, play next, reorder, and queue history. Thread-safe with persistent state. Beautiful analytics showing top artists, top songs, genre distribution, listening streaks, peak hours, and mood patterns. 50-turn sliding window memory persisted to disk. Remembers context across messages β "play that song again" just works. |
Natural language search across all of YouTube. Downloads audio, converts to MP3, and plays through a high-quality audio engine. Create unlimited playlists, add/remove songs, browse collections. All stored persistently in JSON. Tracks favourite genres, favourite artists, liked songs, and dislikes. Uses your taste profile to power personalised recommendations. Downloaded songs are cached with a JSON index tracking play counts and last-played timestamps. Replay = instant. Zero re-downloads. Live playback status with animated sound bars, volume control, skip/seek, pause/resume β all controllable via natural language. |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER (Browser) β
β Premium SPA Β· DM Sans + Space Grotesk β
β Particles Β· Ambient Orbs Β· Frosted Glass UI β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β HTTP (REST API)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FLASK BACKEND (app.py) β
β β
β /api/chat ββββββΊ Lazy-loaded Agent (thread-safe singleton) β
β /api/status ββββΊ Now Playing + Queue Summary + Message Count β
β /api/queue βββββΊ Full Queue State β
β /api/analytics βΊ Computed from Listening History β
β /api/playlists βΊ Playlist Data β
β /api/preferences βΊ User Preferences β
β /api/clear βββββΊ Clear Conversation Memory β
β β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SONIQUE REACT AGENT (agent.py) β
β β
β Framework: jentis ReAct Agent β
β LLM: Google Gemini (gemini-3-flash-preview) β
β Memory: ConversationMemory (50-turn sliding window) β
β Strategy: Multi-step tool chaining with mandatory workflows β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 49 REGISTERED TOOLS β β
β β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ β β
β β β YouTube β β Playback β β Playlist β β β
β β β Search β β Controls β β Manager β β β
β β β (2) β β (9) β β (6) β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ β β
β β ββββββββββββ ββββββββββββ ββββββββββββ β β
β β βPreferenceβ β Mood β β Lyrics β β β
β β β Tracker β β Engine β β Fetcher β β β
β β β (11) β β (3) β β (2) β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ β β
β β ββββββββββββ ββββββββββββ β β
β β β Queue β β History β β β
β β β Manager β βAnalytics β β β
β β β (10) β β (6) β β β
β β ββββββββββββ ββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXTERNAL SERVICES β
β β
β YouTube (via youtubesearchpython) β No API key needed β
β yt-dlp β Audio download from YouTube β
β FFmpeg / imageio-ffmpeg β Audio conversion to MP3 β
β pygame.mixer β 44100 Hz, 16-bit stereo playback engine β
β lrclib.net β Primary lyrics API (plain + synced) β
β lyrics.ovh β Fallback lyrics API β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SoniQue/
β
βββ app.py # Flask REST API + SPA server
βββ agent.py # ReAct Agent (49 tools, Gemini LLM)
βββ README.md # This file
β
βββ Tools/ # All tool modules
β βββ YouTubeSearchTool.py # YouTube search + video metadata (2 tools)
β βββ PlaybackTool.py # Audio download, play, pause, seek (9 tools)
β βββ PlaylistTool.py # CRUD playlist management (6 tools)
β βββ PreferenceTool.py # Favourites, likes, dislikes (11 tools)
β βββ MoodTool.py # Mood detection + mood playlists (3 tools)
β βββ LyricsTool.py # Plain + synced lyrics fetcher (2 tools)
β βββ QueueTool.py # Queue with shuffle/repeat (10 tools)
β βββ HistoryTool.py # Listening history + analytics (6 tools)
β
βββ templates/
β βββ index.html # SPA HTML template
β
βββ static/
β βββ css/
β β βββ style.css # Complete UI design system
β βββ js/
β βββ app.js # SPA interactive logic
β
βββ data/ # Persistent JSON storage
βββ conversation_memory.json # Chat history (50-turn window)
βββ playlists.json # User playlists
βββ preferences.json # Taste profile (genres, artists, likes)
βββ queue.json # Queue state (songs, shuffle, repeat)
βββ listening_history.json # Full play history with timestamps
βββ cache/
βββ songs/
βββ _cache_index.json # Download cache index
βββ *.mp3 # Cached audio files
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Vanilla JS SPA | Interactive single-page application |
| Styling | Custom CSS Design System | Indigo-emerald palette, ambient orbs, particles |
| Fonts | Space Grotesk + DM Sans | Modern display + body typography |
| Icons | Lucide Icons | 200+ clean SVG icons |
| Markdown | marked.js | Render agent responses as rich HTML |
| Backend | Flask 3.x | REST API server |
| AI Agent | jentis ReAct Framework | Reasoning + Acting agent loop |
| LLM | Google Gemini (gemini-3-flash-preview) | Language understanding & generation |
| Search | youtubesearchpython | YouTube video search (no API key) |
| Download | yt-dlp | Audio extraction from YouTube |
| Conversion | FFmpeg / imageio-ffmpeg | Audio format conversion to MP3 |
| Playback | pygame.mixer | High-quality local audio playback |
| Lyrics | lrclib.net + lyrics.ovh | Plain + time-synced lyrics APIs |
| Storage | JSON files | Lightweight persistent data layer |
- Python 3.12+
- Google API Key with Gemini API enabled (Get one here)
- FFmpeg (auto-detected; falls back to
imageio-ffmpegif not installed)
git clone <repo-url> SoniQue
cd SoniQue
# Create virtual environment
python -m venv .venv
# Activate (Windows)
.\.venv\Scripts\Activate.ps1
# Activate (macOS/Linux)
source .venv/bin/activatepip install flask jentis pygame-ce yt-dlp imageio-ffmpeg google-generativeai youtubesearchpython# Required β your Google Gemini API key
set GOOGLE_API_KEY=your-google-api-key # Windows
export GOOGLE_API_KEY=your-google-api-key # macOS/Linux
# Optional β defaults shown
set GEMINI_MODEL=gemini-3-flash-preview
set PYTHONUTF8=1python app.pyOpen http://127.0.0.1:5000 in your browser. That's it!
from agent import chat, get_memory, clear_memory
# Chat with SoniQue
response = chat("Play Bohemian Rhapsody")
print(response)
# Check memory
memory = get_memory()
print(f"Messages: {memory.message_count}")
# Clear conversation
clear_memory()All endpoints return JSON. The server runs at http://127.0.0.1:5000.
| Endpoint | Method | Description | Request Body | Response |
|---|---|---|---|---|
/ |
GET |
Serves the SPA UI | β | HTML |
/api/chat |
POST |
Send message to AI agent | {"message": "..."} |
{"reply": "..."} |
/api/status |
GET |
Playback status + queue summary | β | {now_playing, queue_count, ...} |
/api/queue |
GET |
Full queue state | β | {queue, current_index, shuffle, repeat_mode} |
/api/analytics |
GET |
Listening statistics | β | {total_plays, top_artists, ...} |
/api/playlists |
GET |
All playlists | β | {playlist_name: [songs]} |
/api/preferences |
GET |
User taste profile | β | {favourite_genres, liked_songs, ...} |
/api/clear |
POST |
Clear chat history | β | {"ok": true} |
curl -X POST http://127.0.0.1:5000/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "Play some chill lo-fi music"}'{
"reply": "π΅ Now playing: **Lofi Hip Hop Radio** by ChilledCow..."
}| Tool | Parameters | Description |
|---|---|---|
youtube_search |
query, max_results=5 |
Searches YouTube using youtubesearchpython. Returns video_id, title, URL, duration, views, channel, thumbnail. No API key required. |
youtube_video_details |
video_url |
Gets detailed metadata β duration_seconds, views, category, description, keywords, is_live, is_family_safe. Accepts URL or bare video ID. |
| Tool | Parameters | Description |
|---|---|---|
download_and_play_song |
video_url, video_id, title, channel |
Downloads audio via yt-dlp, converts to MP3, plays through pygame.mixer. Uses song cache to skip re-downloads. |
pause_playback |
β | Pauses the currently playing song |
resume_playback |
β | Resumes a paused song |
stop_playback |
β | Stops playback and clears current song state |
set_volume |
level (0-100) |
Sets playback volume |
get_now_playing |
β | Returns current song title, channel, status, and volume |
show_cached_songs |
β | Lists all cached songs with play counts |
forward_skip |
seconds=10 |
Skips forward in the current song |
backward_skip |
seconds=10 |
Skips backward in the current song |
Implementation notes:
- Song cache (
SongCache) maintains a JSON index atdata/cache/songs/_cache_index.jsonwith download metadata, play counts, and timestamps - FFmpeg resolution: tries
shutil.which("ffmpeg")β falls back toimageio_ffmpeg.get_ffmpeg_exe() - Playback engine: pygame.mixer at 44100 Hz, 16-bit, stereo, 4096 buffer
- Thread-safe
PlaybackStatewiththreading.Lock
| Tool | Parameters | Description |
|---|---|---|
create_playlist |
name |
Creates a new empty playlist |
delete_playlist |
name |
Permanently deletes a playlist |
list_playlists |
β | Lists all playlists with song counts |
view_playlist |
name |
Shows all songs in a playlist |
add_song_to_playlist |
name, title, artist, video_id, video_url |
Adds a song to an existing playlist |
remove_song_from_playlist |
name, index |
Removes a song by 0-based index |
| Tool | Parameters | Description |
|---|---|---|
add_favourite_genres |
genres (comma-separated) |
Add genres to favourites |
remove_favourite_genres |
genres |
Remove genres from favourites |
add_favourite_artists |
artists (comma-separated) |
Add artists to favourites |
remove_favourite_artists |
artists |
Remove artists from favourites |
like_song |
title, artist |
Save a song to liked collection |
unlike_song |
title |
Remove from liked songs |
add_disliked_artists |
artists |
Mark artists as disliked |
remove_disliked_artists |
artists |
Un-dislike artists |
add_disliked_genres |
genres |
Mark genres as disliked |
remove_disliked_genres |
genres |
Un-dislike genres |
get_preferences |
β | Returns full taste profile |
| Tool | Parameters | Description |
|---|---|---|
detect_mood |
text |
Detects user's mood from text using keyword-weighted scoring. Returns mood, confidence, recommended genres, energy level, and search queries. |
get_mood_playlist |
mood |
Returns curated search queries and genre recommendations for any of the 12 supported moods. |
list_available_moods |
β | Lists all 12 mood categories with emojis, descriptions, genres, and energy levels. |
| Tool | Parameters | Description |
|---|---|---|
get_lyrics |
title, artist |
Fetches plain-text lyrics. Tries lrclib.net first, lyrics.ovh as fallback. Truncated to 3000 chars for context. Includes mojibake auto-repair. |
get_synced_lyrics |
title, artist |
Fetches time-synced LRC-format lyrics (karaoke-style) from lrclib.net. |
| Tool | Parameters | Description |
|---|---|---|
add_to_queue |
title, artist, video_id, video_url |
Adds a song to the end of the queue |
add_to_queue_next |
title, artist, video_id, video_url |
Inserts a song right after current position |
remove_from_queue |
index |
Removes a song by 0-based index |
view_queue |
β | Shows all queued songs with current position, shuffle/repeat status |
next_in_queue |
β | Gets next song (respects shuffle/repeat). Returns song info for playback. |
previous_in_queue |
β | Gets previous song in queue |
set_queue_shuffle |
enabled |
Enable/disable shuffle mode |
set_queue_repeat |
mode |
Set repeat: "off", "one", or "all" |
clear_queue |
β | Clears all songs from queue |
move_in_queue |
from_index, to_index |
Reorder a song's position |
| Tool | Parameters | Description |
|---|---|---|
log_song_play |
title, artist, video_id, genre, mood |
Logs play event with timestamp, hour, day-of-week |
get_listening_history |
limit=20 |
Returns recent plays, newest first |
get_listening_stats |
β | Full analytics: top artists/songs, genre/mood distribution, hourly/daily patterns, streaks (current + longest), total hours |
get_recently_played |
limit=5 |
Most recent unique songs (deduplicated) |
clear_listening_history |
β | Clears all history data |
get_music_taste_summary |
β | AI-readable taste profile summary for powering recommendations |
SoniQue detects emotion from your text using keyword-weighted scoring and maps it to one of 12 mood categories, each with curated genres and search queries.
| Mood | Emoji | Energy | Genres |
|---|---|---|---|
| Happy | π | High | Pop, Dance, Funk, Disco |
| Sad | π’ | Low | Ballad, Acoustic, Indie Folk |
| Energetic | β‘ | Very High | EDM, Hip-Hop, Rock, Drum & Bass |
| Relaxed | π§ | Low | Lo-fi, Ambient, Jazz |
| Romantic | π | Medium | R&B, Soul, Soft Pop |
| Angry | π€ | Very High | Metal, Punk, Hard Rock |
| Nostalgic | πΌ | Medium | 80s, 90s, Classic Rock |
| Focused | π§ | Low-Med | Lo-fi, Classical, Ambient |
| Party | π | Very High | Dance, EDM, Latin |
| Sleepy | π΄ | Very Low | Ambient, Sleep, Piano |
| Motivated | π₯ | High | Hip-Hop, Rock, Anthems |
| Melancholic | π§οΈ | Low | Post-Rock, Shoegaze, Dream Pop |
How mood detection works:
- User text is scanned against keyword dictionaries for each mood
- Keyword matches are scored by word length (longer = more specific = higher weight)
- Confidence =
best_score / total_score - The detected mood maps to curated YouTube search queries
- Agent automatically searches, plays, and logs with mood metadata
| Parameter | Value |
|---|---|
| Framework | jentis ReAct (Reasoning + Acting) |
| LLM | Google Gemini gemini-3-flash-preview |
| Temperature | 0.4 (balanced creativity + accuracy) |
| Max Tokens | 8,192 |
| Memory | 50-turn sliding window (100 messages) |
| Tools | 49 registered across 8 modules |
| Strategy | Multi-step tool chaining with mandatory workflows |
The agent follows strict multi-step workflows to ensure complete task execution:
| User Request | Agent Workflow |
|---|---|
| "Play <song>" | youtube_search β download_and_play_song β log_song_play |
| "I feel sad" | detect_mood β get_mood_playlist β youtube_search β download_and_play_song β log_song_play |
| "Show lyrics" | get_now_playing β get_lyrics |
| "Recommend songs" | get_preferences + get_music_taste_summary β personalised suggestions |
| "Play next" | next_in_queue β download_and_play_song β log_song_play |
| "My stats" | get_listening_stats β formatted analytics |
You: "Play Bohemian Rhapsody"
SoniQue: π΅ Now playing: Bohemian Rhapsody by Queen! Rock on! πΈ
You: "I'm feeling really stressed and need to relax"
SoniQue: I can sense you need some calm vibes π Let me play something relaxing...
π΅ Now playing: Weightless by Marconi Union β the most relaxing song ever!
You: "Show me the lyrics for this song"
SoniQue: Here are the lyrics for "Bohemian Rhapsody" by Queen:
Is this the real life? Is this just fantasy?
Caught in a landslide, no escape from reality...
You: "Queue up some party songs"
SoniQue: π I've added 3 party bangers to your queue:
1. Uptown Funk β Bruno Mars
2. Don't Stop Me Now β Queen
3. Levitating β Dua Lipa
Want me to start playing?
You: "What are my listening stats?"
SoniQue: π Here's your music journey:
β’ Total Plays: 47 songs across 12 artists
β’ Top Artist: Queen (8 plays)
β’ Top Genre: Rock πΈ
β’ Peak Hour: 6:00 PM
β’ Listening Streak: 5 days! π₯
You: "I love jazz and lo-fi but hate heavy metal"
SoniQue: Got it! Added jazz and lo-fi to your favourites β€οΈ
and heavy metal to your dislikes. I'll keep this in mind
for recommendations!
1. User types message in the chat UI
2. Frontend sends POST to /api/chat
3. Flask lazy-loads the agent (first request only)
4. Agent receives message + conversation context
5. Gemini LLM reasons about which tools to call
6. Agent executes tools sequentially (multi-step chaining)
7. Agent formats a natural language response
8. Response sent back to frontend as JSON
9. Frontend renders markdown response in chat bubble
10. Status polling updates now-playing and queue sidebar
| Decision | Rationale |
|---|---|
| Single agent, no sub-agents | Faster response times, simpler architecture, one LLM call per request |
| Lazy agent loading | Heavy imports (pygame, yt-dlp) only loaded on first chat request |
| Thread-safe singleton | Multiple browser tabs won't create duplicate agents |
| JSON file storage | Zero-config persistence, easy to debug, no database needed |
| UTF-8 mojibake repair | Windows' default cp1252 encoding corrupts emojis β auto-fix in agent.py |
| Multi-step tool chaining | "Play X" requires search β download β log β agent enforces this |
| youtubesearchpython | No YouTube API key needed β works out of the box |
| Member | Role | |
|---|---|---|
| J | Jitin Kumar Sengar | Team Lead |
| S | Sonia | Developer |
| V | Vishakha Gaur | Developer |
| Y | Yashwant Giri | Developer |
This project is licensed under the MIT License. See LICENSE for details.
Built with β€οΈ by Team StarkMind Β· Hack KRMU 5.0
jentis Β· Google Gemini Β· Flask Β· pygame Β· yt-dlp