LLM Chat

Phoenix LiveView chat with:

multi-provider LLM routing
pluggable STT backends
RAG support
an embeddable iframe widget

The application source lives in ./llm_chat.

Architecture

Browser (LiveView)
    |
    +- WebSocket (Phoenix Channel) ---> STTChannel ---> LlmChat.STT
    |       ^ binary audio                            |
    |       | transcript events                       +--> HTTP backends / native Nx backend
    |
    \- LiveSocket (LiveView WS) -----> ChatLive.Room / ChatLive.Widget
               | phx events                           |
               |                               LlmChat.LLM.Pipeline
               |                                        |
               |                              +---------+----------+
               |                              |                    |
               |                              v                    v
               |                         RAG.Store          System prompt
               |                         (ETS-based)        injection
               |                              |
               |                         LLM.Router
               |                    +------+------+------+------+
               |                    |      |      |      |      |
               |                    v      v      v      v      v
               |                 OpenAI Anthropic Groq Ollama OpenRouter
               |                          / ClawRouter ...
               |                    SSE token streaming
               |
               \---- PubSub events ---- Task.Supervisor

Quick Start

Run these commands from the repository root:

# 1. Start the default development stack
nix run

# 2. Optional local LLM sidecar
nix run .#clawrouter

# 3. Optional local STT servers
nix run .#whisper-server
nix run .#faster-whisper-server

# 4. Full development environment
nix run .#full-dev

# 5. Alternative: dev shell
nix develop
cd llm_chat
mix phx.server

Open http://localhost:4000.

Screenshots

Security

This repository is safe to publish as source code as long as you do not commit local secret files such as .env.

Before pushing to a public GitHub repository:

make sure llm_chat/.env is not tracked by git
rotate any API keys that may already have been exposed locally or in shell history
set a real SECRET_KEY_BASE outside development
set API_AUTH_TOKEN if you expose /api/* on the public internet
set EMBED_ALLOWED_ORIGINS to explicit trusted origins if you use the embed route
keep development-only settings such as code_reloader and live reload disabled in production

Current backend defaults:

/api/* can be protected with Authorization: Bearer <token> or X-API-Key: <token> when API_AUTH_TOKEN is set
iframe embedding defaults to frame-ancestors 'self'
browser routes use CSRF protection and Phoenix secure browser headers
STT socket connections prefer the signed Phoenix session user id over client-supplied socket params

Current limitations:

there is no user authentication / authorization layer yet, so this is not a multi-tenant internet-facing SaaS security model
rate limiting is not implemented yet; use a reverse proxy or API gateway if you expose public endpoints
provider keys are process-wide application secrets, not per-user secrets

LLM Providers

Provider	Type	API key	Notes
ClawRouter	Sidecar	No	x402 router, local gateway
OpenAI	Cloud API	Yes	GPT-family models
Anthropic	Cloud API	Yes	Claude models
Groq	Cloud API	Yes	Fast hosted inference
OpenRouter	Aggregator	Yes	Broad model catalog
Ollama	Local	No	Local models through Ollama

Adding a Provider

Create a module that implements LlmChat.LLM.Provider:

defmodule LlmChat.LLM.Adapters.MyProvider do
  @behaviour LlmChat.LLM.Provider

  def name, do: "My Provider"
  def available?, do: true
  def models, do: ["model-1"]

  def stream(messages, on_token) do
    on_token.("hello")
    :ok
  end
end

Then register it in LlmChat.LLM.Adapters.all/0.

Routing Strategies

:configured uses the configured adapter first
:fallback tries providers in order
:fastest chooses the first provider that passes availability checks

LLM Queue

LLM jobs are routed through a bounded in-process queue so the app can accept many incoming chat requests without spawning unbounded concurrent generations.

Recommended environment variables:

export LLM_MAX_CONCURRENCY=32
export LLM_MAX_QUEUE=10000
export LLM_QUEUE_TIMEOUT_MS=30000
export LLM_JOB_TIMEOUT_MS=120000

Failure modes:

llm_queue_full when the pending queue is already full
llm_queue_timeout when a request waited too long before a worker slot opened
llm_job_timeout when a running provider call exceeded the configured runtime

Speech-to-Text

The STT layer is routed through LlmChat.STT, which keeps Phoenix Channel and LiveView code independent from concrete engines.

Built-in backends:

:local_whisper for a local whisper.cpp HTTP server
:faster_whisper for a local faster-whisper HTTP server
:native_whisper for native Nx/Bumblebee inference inside Elixir

Recommended environment variables:

export STT_BACKEND=faster_whisper
export STT_LANGUAGE=ru

# Native Nx/Bumblebee backend
export STT_NATIVE_MODEL=openai/whisper-medium

# whisper.cpp-style HTTP backend
export STT_LOCAL_URL=http://localhost:9000
export STT_LOCAL_TIMEOUT_MS=300000

# faster-whisper HTTP backend
export STT_FASTER_URL=http://localhost:9100
export STT_FASTER_TIMEOUT_MS=300000

Legacy *_WHISPER_* environment variables are still accepted as fallbacks, but the STT_* names are now preferred.

Capability Endpoint

GET /api/stt/providers returns the available STT backends, transport type, and capability flags.

Example response:

{
  "configured": "faster_whisper",
  "providers": [
    {
      "id": "faster_whisper",
      "name": "HTTP faster backend",
      "available": true,
      "transport": "http",
      "streaming_segments": true,
      "supports_language": true,
      "configured": true
    }
  ]
}

RAG

The current implementation uses an ETS-backed in-process store in LlmChat.RAG.Store.

Each LLM request can:

search relevant documents
inject matched context into the system prompt
generate a response using that context

For production, swap the store for pgvector, Qdrant, or another external system.

File Layout

llm_chat/lib/llm_chat/
├── application.ex
├── endpoint.ex
├── router.ex
├── stt.ex
├── controllers/
│   ├── providers_controller.ex
│   ├── stt_controller.ex
│   └── stt_providers_controller.ex
├── llm/
│   ├── adapters.ex
│   ├── adapters/
│   │   ├── anthropic.ex
│   │   ├── claw_router.ex
│   │   ├── groq.ex
│   │   ├── ollama.ex
│   │   ├── open_router.ex
│   │   └── openai.ex
│   ├── pipeline.ex
│   ├── provider.ex
│   └── router.ex
├── live/
│   ├── chat_live_components.ex
│   ├── chat_live_index.ex
│   ├── chat_live_room.ex
│   ├── chat_live_shared.ex
│   └── chat_live_widget.ex
└── rag/
    └── store.ex

Embedding

See ./llm_chat/EMBED.md for the embed script, iframe, and host-page integration details.

Recommended production embed configuration:

export EMBED_ALLOWED_ORIGINS="https://app.example.com"

Load Testing

The repository includes an application-level Elixir load generator:

nix develop -c mix loadtest.chat \
  --connections 10000 \
  --messages-per-connection 5 \
  --max-concurrency 1000 \
  --fake-tokens 32 \
  --fake-delay-ms 0 \
  --rag=false

What it does:

switches chat persistence to the in-memory backend for the run
uses LlmChat.LLM.Adapters.FakeLoad instead of a real provider
creates virtual chat sessions and conversations
sends messages through LlmChat.Chat, which persists user and assistant messages and runs the normal LLM pipeline
prints throughput, VM memory, process count, and latency percentiles

Important limitation:

this is an application-layer stress test, not a browser or LiveView WebSocket protocol benchmark
use it to measure pipeline, persistence, task concurrency, and BEAM behavior under large message volume
if you need true socket-level benchmarking, add a dedicated Phoenix/WebSocket load client on top

The repository also includes a Phoenix Channel socket-level load generator:

nix develop -c mix loadtest.socket \
  --connections 2000 \
  --messages-per-connection 3 \
  --max-concurrency 500 \
  --port 4100 \
  --fake-tokens 32 \
  --fake-delay-ms 0 \
  --rag=false

This task:

starts the Phoenix endpoint on a dedicated port
opens raw WebSocket connections to /socket/websocket
joins chat_load:* topics through the Phoenix channel protocol
sends messages over the socket and waits for async message_result events
prints throughput, VM memory, process count, and latency percentiles

Verification

nix develop -c bash -lc 'cd llm_chat && MIX_ENV=test mix test'

License

MIT, see ./LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
llm_chat		llm_chat
.formatter.exs		.formatter.exs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Chat

Architecture

Quick Start

Screenshots

Security

LLM Providers

Adding a Provider

Routing Strategies

LLM Queue

Speech-to-Text

Capability Endpoint

RAG

File Layout

Embedding

Load Testing

Verification

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Chat

Architecture

Quick Start

Screenshots

Security

LLM Providers

Adding a Provider

Routing Strategies

LLM Queue

Speech-to-Text

Capability Endpoint

RAG

File Layout

Embedding

Load Testing

Verification

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages