This repository provides a collection of Python examples demonstrating the capabilities of the Google Gemini Live API. The project showcases a progressive evolution from a simple text-based chat to a full-featured, real-time, two-way voice conversation, illustrating how to implement and scale features with the API.
- Real-time Two-Way Audio: Engage in a live, low-latency voice conversation with the Gemini model.
- Microphone Input: Captures audio directly from your microphone for seamless interaction.
- Live Audio Playback: Plays the model's voice response directly to your speakers without saving to a file.
- Text-Based Chat: Includes a separate mode for traditional text-based interaction.
- System Configuration: Allows for microphone testing and command-line theme customization for a better user experience.
- Robust Asynchronous Handling: Built with
asyncioto manage concurrent tasks like audio recording, streaming, and receiving data.
The repository contains several script versions, each building upon the last. This structure is designed to provide a clear learning path.
| Version | Goal | Interaction Type | Audio Input | Audio Output | Key Libraries |
|---|---|---|---|---|---|
LiveAPIv0.py |
Basic API Demo | Text-only | ❌ None | ❌ None | google-genai |
LiveAPIv1.py |
File-based Audio | Text or Audio | From audio file | Writes to .wav file |
librosa, soundfile |
LiveAPIv2.py |
Real-time Audio (Basic) | Text or Audio | 🎙️ Microphone | Writes to .wav file |
sounddevice |
LiveAPIv3.py |
Real-time Voice Chat | Text or Audio | 🎙️ Microphone | 🔊 Live Speaker Playback | sounddevice, numpy |
- Python 3.9+
- A Google Gemini API Key.
-
Clone the repository:
git clone <your-repository-url> cd <your-repository-directory>
-
Install the required Python libraries:
pip install google-genai python-dotenv sounddevice soundfile librosa numpy
Note: On some systems, you may need to install system-level audio libraries like
portaudioforsounddeviceto work correctly.
-
Create a file named
.envin the root of the project directory. -
Add your Gemini API key to the
.envfile as follows:GEMINI_KEY="YOUR_API_KEY_HERE"
The main, most feature-complete script is LiveAPIv3.py.
-
Run the application:
python LiveAPIv3.py
-
Select an option from the menu:
1. Real-time Audio Interaction: Start a voice conversation. PressEnterto stop recording your voice and wait for the model's response.2. Text Interaction: Start a text-based chat session.3. Test Microphone: Record and play back a short audio clip to verify your microphone is working.4. Config Theme: Change the color scheme of the command-line interface.5. Exit: Close the program.
The other scripts (LiveAPIv0.py, LiveAPIv1.py, LiveAPIv2.py) can be run similarly and are provided for educational purposes to understand the development progression.
This project is licensed under the MIT License. See the LICENSE file for details.