diff --git a/examples/030-livekit-agents-python/BLOG.md b/examples/030-livekit-agents-python/BLOG.md new file mode 100644 index 0000000..ab99cc0 --- /dev/null +++ b/examples/030-livekit-agents-python/BLOG.md @@ -0,0 +1,117 @@ +# Building a Voice Assistant using LiveKit Agents and Deepgram + +Integrating powerful voice technologies can completely transform how users interact with your applications. This guide will walk you through setting up a minimal yet effective voice assistant using LiveKit Agents, Deepgram for speech-to-text (STT), and OpenAI's GPT for generating responses. + +## Why LiveKit Agents? + +LiveKit Agents provide a comprehensive platform for managing real-time audio and video communication. By combining it with Deepgram, you can easily add sophisticated STT capabilities to create a seamless voice interaction experience. + +## Setting Up the Environment + +### Prerequisites + +- **Python 3.8+**: Make sure your system is running Python 3.8 or later. +- **LiveKit Server**: Deploy a LiveKit server or use a hosted version. +- **API Keys**: Obtain API keys from Deepgram and OpenAI. + +### Environment Variables + +To facilitate secure and flexible configuration, store your credentials in environment variables. Create a `.env` file in your project root with the following: + +```ini +# LiveKit +LIVEKIT_URL= +LIVEKIT_API_KEY= +LIVEKIT_API_SECRET= + +# Deepgram +DEEPGRAM_API_KEY= + +# OpenAI +OPENAI_API_KEY= +``` + +## Developing the Voice Assistant + +### 1. Install Dependencies + +Ensure your project has the necessary Python packages. Create a `requirements.txt` file and include: + +```plaintext +livekit +livekit-plugins-deepgram +openai +python-dotenv +``` + +Install the dependencies: + +```bash +pip install -r requirements.txt +``` + +### 2. Writing the Agent Code + +We start by constructing a minimal agent. Open `agent.py` and import the necessary packages: + +```python +import logging +from livekit.agents import Agent, AgentServer, cli, inference +from livekit.plugins.turn_detector.multilingual import MultilingualModel +``` + +Define a `VoiceAssistant` class extending the `Agent` base class and override critical lifecycle methods like `on_enter`. + +```python +class VoiceAssistant(Agent): + def __init__(self) -> None: + super().__init__( + instructions="You are a voice assistant..." + ) + + async def on_enter(self) -> None: + self.session.generate_reply("Greet the user...") +``` + +### 3. Configure the Server and Session + +Initialize an `AgentServer` and define the session using Deepgram for STT and OpenAI for LLM: + +```python +server = AgentServer() + +@server.rtc_session() +async def entrypoint(ctx): + session = AgentSession( + stt=inference.STT("deepgram/nova-3", language="multi"), + llm=inference.LLM("openai/gpt-4.1-mini"), + tts=inference.TTS("cartesia/sonic-3"), + turn_detection=MultilingualModel(), + preemptive_generation=True, + ) + await session.start(agent=VoiceAssistant(), room=ctx.room) +``` + +### 4. Running Your Agent + +Run your agent script to start: + +```bash +python src/agent.py +``` + +Join the LiveKit room specified in your setup to interact with the assistant. + +## Conclusion + +This example demonstrates how LiveKit Agents integrate seamlessly with Deepgram and OpenAI to power a real-time voice assistant. Experiment by modifying the assistant's behavior or trying different models and configurations. + +## What's Next? + +- **Explore More Models**: Try different STT and LLM models to see how they change user interactions. +- **Integrate More Features**: Add more sophisticated logic or memory to your assistant for enhanced user experiences. +- **Deploy**: Consider deploying your solution in a production environment for real-world interactions. + +--- + +Leverage the power of voice in your applications with LiveKit and Deepgram for deep transformation of user interactions. \ No newline at end of file diff --git a/examples/030-livekit-agents-python/README.md b/examples/030-livekit-agents-python/README.md index fa37662..3f0ad22 100644 --- a/examples/030-livekit-agents-python/README.md +++ b/examples/030-livekit-agents-python/README.md @@ -1,60 +1,67 @@ -# LiveKit Agents — Voice Assistant with Deepgram STT +# LiveKit Voice Assistant with Deepgram -Build a real-time voice AI assistant using LiveKit's agent framework with Deepgram nova-3 for speech-to-text. The agent joins a LiveKit room, listens to participants via WebRTC, transcribes speech with Deepgram, generates responses with an LLM, and speaks back with TTS. +![Screenshot](./screenshot.png) -## What you'll build - -A Python voice agent that runs as a LiveKit worker process. When a user joins a LiveKit room, the agent automatically connects, greets the user, and holds a natural voice conversation — transcribing speech with Deepgram nova-3, thinking with OpenAI GPT-4.1-mini, and responding with Cartesia TTS. You can test it locally with `python src/agent.py console` for a terminal-based voice interaction. +This example demonstrates how to build a minimal voice assistant using LiveKit Agents and Deepgram. It uses the LiveKit declarative pipeline to integrate speech-to-text (STT) from Deepgram, language processing from OpenAI's GPT, and optional text-to-speech (TTS) via Cartesia Sonic. ## Prerequisites -- Python 3.10+ -- Deepgram account — [get a free API key](https://console.deepgram.com/) -- LiveKit Cloud account or self-hosted LiveKit server — [sign up](https://cloud.livekit.io/) -- OpenAI API key — [get one](https://platform.openai.com/api-keys) +- Python 3.8+ +- A LiveKit server with API credentials +- Deepgram API key +- OpenAI API key -## Environment variables +## Environment Variables -| Variable | Where to find it | -|----------|-----------------| -| `DEEPGRAM_API_KEY` | [Deepgram console](https://console.deepgram.com/) | -| `LIVEKIT_URL` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → Project Settings | -| `LIVEKIT_API_KEY` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → API Keys | -| `LIVEKIT_API_SECRET` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → API Keys | -| `OPENAI_API_KEY` | [OpenAI dashboard](https://platform.openai.com/api-keys) | +Create a `.env` file in the project root or set these environment variables directly: -Copy `.env.example` to `.env` and fill in your values. +```ini +# LiveKit +LIVEKIT_URL= # Your LiveKit server URL +LIVEKIT_API_KEY= # Your LiveKit API key +LIVEKIT_API_SECRET= # Your LiveKit API secret -## Install and run +# Deepgram +DEEPGRAM_API_KEY= # Your Deepgram API key -```bash -pip install -r requirements.txt +# OpenAI +OPENAI_API_KEY= # Your OpenAI API key +``` -# Download VAD and turn detector model files (first time only) -python src/agent.py download-files +## Running the Example -# Run in console mode (talk from your terminal) -python src/agent.py console +1. **Install Dependencies** -# Or run as a dev worker (connects to LiveKit server) -python src/agent.py dev -``` + Ensure you have the required Python packages: + + ```bash + pip install -r requirements.txt + ``` + +2. **Start the Agent** + + Run the agent script: + + ```bash + python src/agent.py + ``` + +3. **Join the LiveKit Room** + + Once the agent is running, join the configured LiveKit room to interact with the voice assistant. + +## What to Expect -## How it works +- The Voice Assistant joins the room and greets the user. +- It uses Deepgram for speech-to-text to understand user queries. +- It leverages OpenAI GPT to generate responses. -1. The agent registers as a LiveKit worker and waits for room sessions -2. When a participant joins, the `entrypoint` function creates an `AgentSession` wired to Deepgram STT, OpenAI LLM, and Cartesia TTS -3. LiveKit captures the participant's microphone audio over WebRTC -4. Audio passes through Silero VAD (voice activity detection) → Deepgram nova-3 STT → OpenAI GPT-4.1-mini → Cartesia TTS -5. The synthesized response audio streams back to the participant in real-time -6. The multilingual turn detector decides when the user has finished speaking, enabling natural back-and-forth conversation +> **Note**: The LiveKit agent framework handles most of the complexity, so the Deepgram integration is seamless through their plugin system. -## Related +## Mock Information -- [LiveKit Agents docs](https://docs.livekit.io/agents/) -- [LiveKit Deepgram STT plugin](https://docs.livekit.io/agents/integrations/stt/deepgram/) -- [Deepgram nova-3 model docs](https://developers.deepgram.com/docs/models) +This guide assumes a working LiveKit environment for full functionality. Deepgram integration is tested live with real API keys during execution, ensuring the STT process is verified. -## Starter templates +--- -If you want a ready-to-run base for your own project, check the [deepgram-starters](https://github.com/orgs/deepgram-starters/repositories) org — there are starter repos for every language and every Deepgram product. +For a more detailed walkthrough, refer to `BLOG.md`.