Phoenix LiveView chat with:
- multi-provider LLM routing
- pluggable STT backends
- RAG support
- an embeddable iframe widget
The application source lives in ./llm_chat.
Browser (LiveView)
|
+- WebSocket (Phoenix Channel) ---> STTChannel ---> LlmChat.STT
| ^ binary audio |
| | transcript events +--> HTTP backends / native Nx backend
|
\- LiveSocket (LiveView WS) -----> ChatLive.Room / ChatLive.Widget
| phx events |
| LlmChat.LLM.Pipeline
| |
| +---------+----------+
| | |
| v v
| RAG.Store System prompt
| (ETS-based) injection
| |
| LLM.Router
| +------+------+------+------+
| | | | | |
| v v v v v
| OpenAI Anthropic Groq Ollama OpenRouter
| / ClawRouter ...
| SSE token streaming
|
\---- PubSub events ---- Task.Supervisor
Run these commands from the repository root:
# 1. Start the default development stack
nix run
# 2. Optional local LLM sidecar
nix run .#clawrouter
# 3. Optional local STT servers
nix run .#whisper-server
nix run .#faster-whisper-server
# 4. Full development environment
nix run .#full-dev
# 5. Alternative: dev shell
nix develop
cd llm_chat
mix phx.serverOpen http://localhost:4000.
This repository is safe to publish as source code as long as you do not commit local
secret files such as .env.
Before pushing to a public GitHub repository:
- make sure
llm_chat/.envis not tracked by git - rotate any API keys that may already have been exposed locally or in shell history
- set a real
SECRET_KEY_BASEoutside development - set
API_AUTH_TOKENif you expose/api/*on the public internet - set
EMBED_ALLOWED_ORIGINSto explicit trusted origins if you use the embed route - keep development-only settings such as
code_reloaderand live reload disabled in production
Current backend defaults:
/api/*can be protected withAuthorization: Bearer <token>orX-API-Key: <token>whenAPI_AUTH_TOKENis set- iframe embedding defaults to
frame-ancestors 'self' - browser routes use CSRF protection and Phoenix secure browser headers
- STT socket connections prefer the signed Phoenix session user id over client-supplied socket params
Current limitations:
- there is no user authentication / authorization layer yet, so this is not a multi-tenant internet-facing SaaS security model
- rate limiting is not implemented yet; use a reverse proxy or API gateway if you expose public endpoints
- provider keys are process-wide application secrets, not per-user secrets
| Provider | Type | API key | Notes |
|---|---|---|---|
| ClawRouter | Sidecar | No | x402 router, local gateway |
| OpenAI | Cloud API | Yes | GPT-family models |
| Anthropic | Cloud API | Yes | Claude models |
| Groq | Cloud API | Yes | Fast hosted inference |
| OpenRouter | Aggregator | Yes | Broad model catalog |
| Ollama | Local | No | Local models through Ollama |
Create a module that implements LlmChat.LLM.Provider:
defmodule LlmChat.LLM.Adapters.MyProvider do
@behaviour LlmChat.LLM.Provider
def name, do: "My Provider"
def available?, do: true
def models, do: ["model-1"]
def stream(messages, on_token) do
on_token.("hello")
:ok
end
endThen register it in LlmChat.LLM.Adapters.all/0.
:configureduses the configured adapter first:fallbacktries providers in order:fastestchooses the first provider that passes availability checks
LLM jobs are routed through a bounded in-process queue so the app can accept many incoming chat requests without spawning unbounded concurrent generations.
Recommended environment variables:
export LLM_MAX_CONCURRENCY=32
export LLM_MAX_QUEUE=10000
export LLM_QUEUE_TIMEOUT_MS=30000
export LLM_JOB_TIMEOUT_MS=120000Failure modes:
llm_queue_fullwhen the pending queue is already fullllm_queue_timeoutwhen a request waited too long before a worker slot openedllm_job_timeoutwhen a running provider call exceeded the configured runtime
The STT layer is routed through LlmChat.STT, which keeps Phoenix Channel and LiveView code independent from concrete engines.
Built-in backends:
:local_whisperfor a localwhisper.cppHTTP server:faster_whisperfor a localfaster-whisperHTTP server:native_whisperfor nativeNx/Bumblebeeinference inside Elixir
Recommended environment variables:
export STT_BACKEND=faster_whisper
export STT_LANGUAGE=ru
# Native Nx/Bumblebee backend
export STT_NATIVE_MODEL=openai/whisper-medium
# whisper.cpp-style HTTP backend
export STT_LOCAL_URL=http://localhost:9000
export STT_LOCAL_TIMEOUT_MS=300000
# faster-whisper HTTP backend
export STT_FASTER_URL=http://localhost:9100
export STT_FASTER_TIMEOUT_MS=300000Legacy *_WHISPER_* environment variables are still accepted as fallbacks, but the STT_* names are now preferred.
GET /api/stt/providers returns the available STT backends, transport type, and capability flags.
Example response:
{
"configured": "faster_whisper",
"providers": [
{
"id": "faster_whisper",
"name": "HTTP faster backend",
"available": true,
"transport": "http",
"streaming_segments": true,
"supports_language": true,
"configured": true
}
]
}The current implementation uses an ETS-backed in-process store in LlmChat.RAG.Store.
Each LLM request can:
- search relevant documents
- inject matched context into the system prompt
- generate a response using that context
For production, swap the store for pgvector, Qdrant, or another external system.
llm_chat/lib/llm_chat/
├── application.ex
├── endpoint.ex
├── router.ex
├── stt.ex
├── controllers/
│ ├── providers_controller.ex
│ ├── stt_controller.ex
│ └── stt_providers_controller.ex
├── llm/
│ ├── adapters.ex
│ ├── adapters/
│ │ ├── anthropic.ex
│ │ ├── claw_router.ex
│ │ ├── groq.ex
│ │ ├── ollama.ex
│ │ ├── open_router.ex
│ │ └── openai.ex
│ ├── pipeline.ex
│ ├── provider.ex
│ └── router.ex
├── live/
│ ├── chat_live_components.ex
│ ├── chat_live_index.ex
│ ├── chat_live_room.ex
│ ├── chat_live_shared.ex
│ └── chat_live_widget.ex
└── rag/
└── store.ex
See ./llm_chat/EMBED.md for the embed script, iframe, and host-page integration details.
Recommended production embed configuration:
export EMBED_ALLOWED_ORIGINS="https://app.example.com"The repository includes an application-level Elixir load generator:
nix develop -c mix loadtest.chat \
--connections 10000 \
--messages-per-connection 5 \
--max-concurrency 1000 \
--fake-tokens 32 \
--fake-delay-ms 0 \
--rag=falseWhat it does:
- switches chat persistence to the in-memory backend for the run
- uses
LlmChat.LLM.Adapters.FakeLoadinstead of a real provider - creates virtual chat sessions and conversations
- sends messages through
LlmChat.Chat, which persists user and assistant messages and runs the normal LLM pipeline - prints throughput, VM memory, process count, and latency percentiles
Important limitation:
- this is an application-layer stress test, not a browser or LiveView WebSocket protocol benchmark
- use it to measure pipeline, persistence, task concurrency, and BEAM behavior under large message volume
- if you need true socket-level benchmarking, add a dedicated Phoenix/WebSocket load client on top
The repository also includes a Phoenix Channel socket-level load generator:
nix develop -c mix loadtest.socket \
--connections 2000 \
--messages-per-connection 3 \
--max-concurrency 500 \
--port 4100 \
--fake-tokens 32 \
--fake-delay-ms 0 \
--rag=falseThis task:
- starts the Phoenix endpoint on a dedicated port
- opens raw WebSocket connections to
/socket/websocket - joins
chat_load:*topics through the Phoenix channel protocol - sends messages over the socket and waits for async
message_resultevents - prints throughput, VM memory, process count, and latency percentiles
nix develop -c bash -lc 'cd llm_chat && MIX_ENV=test mix test'MIT, see ./LICENSE.