A real-time, multimodal AI-powered music tutor built as a mobile-first Progressive Web App. Designed to provide personalized coaching for students who cannot afford private lessons.
- Conversational onboarding - warm, personalized greeting and goal-setting
- Melody recognition - sing a melody and the AI identifies the song
- Sheet music search - finds legal, public domain sheet music PDFs
- Vocal coaching - feedback on pitch, rhythm, and diction across multiple rounds
- Performance coaching - posture, breathing, and presentation analysis via camera
- Final evaluation - stage readiness scoring with a 0-100 breakdown
- Progress tracking - historical scores, trends, and session comparison
- Animated dog avatar - state-driven SVG coach with real-time expressions
- Real-time interaction - streaming responses, live status indicators, interruption support
| Layer | Technology |
|---|---|
| Frontend | Next.js 15 (App Router), TypeScript, Tailwind CSS |
| Animation | Framer Motion |
| AI (planned) | Gemini Live API, Google ADK |
| Backend (planned) | Firebase Auth, Firestore, Cloud Storage, Cloud Functions |
| PDF Viewer (planned) | PDF.js |
| PWA | Service Worker, Web App Manifest |
npm install
npm run devOpen http://localhost:3000 on your phone or in a mobile-width browser.
src/
app/ # Next.js App Router pages
page.tsx # Home (splash + onboarding + returning user)
studio/page.tsx # Live coaching studio
feedback/page.tsx # Session results and scoring
progress/page.tsx # Historical progress tracking
components/
avatar/ # Animated dog coach avatar (SVG)
studio/ # Transcript, media controls
sheet-music/ # Sheet music panel + PDF viewer
ui/ # Shared UI (ScoreRing, StatusPill, FeedbackCard, etc.)
lib/
state/ # Session context (useReducer) + coaching state machine
services/ # Melody recognition, sheet music search, scoring, history
gemini/ # Gemini Live API client abstraction
firebase/ # Firebase configuration
types/ # TypeScript interfaces for all data models
idle -> onboarding -> melody_recognition -> sheet_music_search
-> vocal_coaching -> physical_coaching -> final_evaluation -> session_complete
Each stage supports interruptions and can be navigated via voice or text.
| Category | Max Points |
|---|---|
| Pitch | 25 |
| Rhythm | 20 |
| Diction | 15 |
| Expression | 15 |
| Posture | 10 |
| Breathing | 10 |
| Presentation | 5 |
This MVP uses mock services for demo purposes. Here's what to replace for production:
| Component | MVP (Current) | Production |
|---|---|---|
| AI Interaction | Canned responses with streaming simulation | Gemini Live API WebSocket |
| Melody Recognition | Mock song candidates | Gemini audio analysis / dedicated ML model |
| Sheet Music Search | Mock results | Google Search API + IMSLP scraping via Cloud Function |
| Vocal Analysis | Random scores with feedback templates | Gemini audio analysis |
| Video Analysis | Mock scores | Gemini video analysis |
| PDF Viewer | Placeholder with external link | PDF.js embedded renderer |
| Auth | localStorage | Firebase Authentication |
| Database | localStorage | Cloud Firestore |
| Media Storage | Not persisted | Cloud Storage for Firebase |
| Analytics | None | Google Analytics for Firebase |
| Security | None | Firebase App Check |
Deploy to Vercel, Firebase Hosting, or any static host:
npm run build- Chao Zhang — hk.chaozhang@gmail.com
- John Chong — xchong92@gmail.com
- Timothy Asiimwe — atimothee@gmail.com
- Louis Cheng — louis.cheng7@gmail.com
MIT