Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions examples/030-livekit-agents-python/BLOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Building a Voice Assistant using LiveKit Agents and Deepgram

Integrating powerful voice technologies can completely transform how users interact with your applications. This guide will walk you through setting up a minimal yet effective voice assistant using LiveKit Agents, Deepgram for speech-to-text (STT), and OpenAI's GPT for generating responses.

## Why LiveKit Agents?

LiveKit Agents provide a comprehensive platform for managing real-time audio and video communication. By combining it with Deepgram, you can easily add sophisticated STT capabilities to create a seamless voice interaction experience.

## Setting Up the Environment

### Prerequisites

- **Python 3.8+**: Make sure your system is running Python 3.8 or later.
- **LiveKit Server**: Deploy a LiveKit server or use a hosted version.
- **API Keys**: Obtain API keys from Deepgram and OpenAI.

### Environment Variables

To facilitate secure and flexible configuration, store your credentials in environment variables. Create a `.env` file in your project root with the following:

```ini
# LiveKit
LIVEKIT_URL=<your_livekit_url>
LIVEKIT_API_KEY=<your_livekit_api_key>
LIVEKIT_API_SECRET=<your_livekit_api_secret>

# Deepgram
DEEPGRAM_API_KEY=<your_deepgram_api_key>

# OpenAI
OPENAI_API_KEY=<your_openai_api_key>
```

## Developing the Voice Assistant

### 1. Install Dependencies

Ensure your project has the necessary Python packages. Create a `requirements.txt` file and include:

```plaintext
livekit
livekit-plugins-deepgram
openai
python-dotenv
```

Install the dependencies:

```bash
pip install -r requirements.txt
```

### 2. Writing the Agent Code

We start by constructing a minimal agent. Open `agent.py` and import the necessary packages:

```python
import logging
from livekit.agents import Agent, AgentServer, cli, inference
from livekit.plugins.turn_detector.multilingual import MultilingualModel
```

Define a `VoiceAssistant` class extending the `Agent` base class and override critical lifecycle methods like `on_enter`.

```python
class VoiceAssistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="You are a voice assistant..."
)

async def on_enter(self) -> None:
self.session.generate_reply("Greet the user...")
```

### 3. Configure the Server and Session

Initialize an `AgentServer` and define the session using Deepgram for STT and OpenAI for LLM:

```python
server = AgentServer()

@server.rtc_session()
async def entrypoint(ctx):
session = AgentSession(
stt=inference.STT("deepgram/nova-3", language="multi"),
llm=inference.LLM("openai/gpt-4.1-mini"),
tts=inference.TTS("cartesia/sonic-3"),
turn_detection=MultilingualModel(),
preemptive_generation=True,
)
await session.start(agent=VoiceAssistant(), room=ctx.room)
```

### 4. Running Your Agent

Run your agent script to start:

```bash
python src/agent.py
```

Join the LiveKit room specified in your setup to interact with the assistant.

## Conclusion

This example demonstrates how LiveKit Agents integrate seamlessly with Deepgram and OpenAI to power a real-time voice assistant. Experiment by modifying the assistant's behavior or trying different models and configurations.

## What's Next?

- **Explore More Models**: Try different STT and LLM models to see how they change user interactions.
- **Integrate More Features**: Add more sophisticated logic or memory to your assistant for enhanced user experiences.
- **Deploy**: Consider deploying your solution in a production environment for real-world interactions.

---

Leverage the power of voice in your applications with LiveKit and Deepgram for deep transformation of user interactions.
89 changes: 48 additions & 41 deletions examples/030-livekit-agents-python/README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,67 @@
# LiveKit Agents — Voice Assistant with Deepgram STT
# LiveKit Voice Assistant with Deepgram

Build a real-time voice AI assistant using LiveKit's agent framework with Deepgram nova-3 for speech-to-text. The agent joins a LiveKit room, listens to participants via WebRTC, transcribes speech with Deepgram, generates responses with an LLM, and speaks back with TTS.
![Screenshot](./screenshot.png)

## What you'll build

A Python voice agent that runs as a LiveKit worker process. When a user joins a LiveKit room, the agent automatically connects, greets the user, and holds a natural voice conversation — transcribing speech with Deepgram nova-3, thinking with OpenAI GPT-4.1-mini, and responding with Cartesia TTS. You can test it locally with `python src/agent.py console` for a terminal-based voice interaction.
This example demonstrates how to build a minimal voice assistant using LiveKit Agents and Deepgram. It uses the LiveKit declarative pipeline to integrate speech-to-text (STT) from Deepgram, language processing from OpenAI's GPT, and optional text-to-speech (TTS) via Cartesia Sonic.

## Prerequisites

- Python 3.10+
- Deepgram account — [get a free API key](https://console.deepgram.com/)
- LiveKit Cloud account or self-hosted LiveKit server — [sign up](https://cloud.livekit.io/)
- OpenAI API key — [get one](https://platform.openai.com/api-keys)
- Python 3.8+
- A LiveKit server with API credentials
- Deepgram API key
- OpenAI API key

## Environment variables
## Environment Variables

| Variable | Where to find it |
|----------|-----------------|
| `DEEPGRAM_API_KEY` | [Deepgram console](https://console.deepgram.com/) |
| `LIVEKIT_URL` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → Project Settings |
| `LIVEKIT_API_KEY` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → API Keys |
| `LIVEKIT_API_SECRET` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → API Keys |
| `OPENAI_API_KEY` | [OpenAI dashboard](https://platform.openai.com/api-keys) |
Create a `.env` file in the project root or set these environment variables directly:

Copy `.env.example` to `.env` and fill in your values.
```ini
# LiveKit
LIVEKIT_URL= # Your LiveKit server URL
LIVEKIT_API_KEY= # Your LiveKit API key
LIVEKIT_API_SECRET= # Your LiveKit API secret

## Install and run
# Deepgram
DEEPGRAM_API_KEY= # Your Deepgram API key

```bash
pip install -r requirements.txt
# OpenAI
OPENAI_API_KEY= # Your OpenAI API key
```

# Download VAD and turn detector model files (first time only)
python src/agent.py download-files
## Running the Example

# Run in console mode (talk from your terminal)
python src/agent.py console
1. **Install Dependencies**

# Or run as a dev worker (connects to LiveKit server)
python src/agent.py dev
```
Ensure you have the required Python packages:

```bash
pip install -r requirements.txt
```

2. **Start the Agent**

Run the agent script:

```bash
python src/agent.py
```

3. **Join the LiveKit Room**

Once the agent is running, join the configured LiveKit room to interact with the voice assistant.

## What to Expect

## How it works
- The Voice Assistant joins the room and greets the user.
- It uses Deepgram for speech-to-text to understand user queries.
- It leverages OpenAI GPT to generate responses.

1. The agent registers as a LiveKit worker and waits for room sessions
2. When a participant joins, the `entrypoint` function creates an `AgentSession` wired to Deepgram STT, OpenAI LLM, and Cartesia TTS
3. LiveKit captures the participant's microphone audio over WebRTC
4. Audio passes through Silero VAD (voice activity detection) → Deepgram nova-3 STT → OpenAI GPT-4.1-mini → Cartesia TTS
5. The synthesized response audio streams back to the participant in real-time
6. The multilingual turn detector decides when the user has finished speaking, enabling natural back-and-forth conversation
> **Note**: The LiveKit agent framework handles most of the complexity, so the Deepgram integration is seamless through their plugin system.

## Related
## Mock Information

- [LiveKit Agents docs](https://docs.livekit.io/agents/)
- [LiveKit Deepgram STT plugin](https://docs.livekit.io/agents/integrations/stt/deepgram/)
- [Deepgram nova-3 model docs](https://developers.deepgram.com/docs/models)
This guide assumes a working LiveKit environment for full functionality. Deepgram integration is tested live with real API keys during execution, ensuring the STT process is verified.

## Starter templates
---

If you want a ready-to-run base for your own project, check the [deepgram-starters](https://github.com/orgs/deepgram-starters/repositories) org — there are starter repos for every language and every Deepgram product.
For a more detailed walkthrough, refer to `BLOG.md`.