Skip to content

yoptabyte/ElixirBotAssistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Chat

Phoenix LiveView chat with:

  • multi-provider LLM routing
  • pluggable STT backends
  • RAG support
  • an embeddable iframe widget

The application source lives in ./llm_chat.

Architecture

Browser (LiveView)
    |
    +- WebSocket (Phoenix Channel) ---> STTChannel ---> LlmChat.STT
    |       ^ binary audio                            |
    |       | transcript events                       +--> HTTP backends / native Nx backend
    |
    \- LiveSocket (LiveView WS) -----> ChatLive.Room / ChatLive.Widget
               | phx events                           |
               |                               LlmChat.LLM.Pipeline
               |                                        |
               |                              +---------+----------+
               |                              |                    |
               |                              v                    v
               |                         RAG.Store          System prompt
               |                         (ETS-based)        injection
               |                              |
               |                         LLM.Router
               |                    +------+------+------+------+
               |                    |      |      |      |      |
               |                    v      v      v      v      v
               |                 OpenAI Anthropic Groq Ollama OpenRouter
               |                          / ClawRouter ...
               |                    SSE token streaming
               |
               \---- PubSub events ---- Task.Supervisor

Quick Start

Run these commands from the repository root:

# 1. Start the default development stack
nix run

# 2. Optional local LLM sidecar
nix run .#clawrouter

# 3. Optional local STT servers
nix run .#whisper-server
nix run .#faster-whisper-server

# 4. Full development environment
nix run .#full-dev

# 5. Alternative: dev shell
nix develop
cd llm_chat
mix phx.server

Open http://localhost:4000.

Screenshots

Embedded session view Saved chats sidebar view

Security

This repository is safe to publish as source code as long as you do not commit local secret files such as .env.

Before pushing to a public GitHub repository:

  • make sure llm_chat/.env is not tracked by git
  • rotate any API keys that may already have been exposed locally or in shell history
  • set a real SECRET_KEY_BASE outside development
  • set API_AUTH_TOKEN if you expose /api/* on the public internet
  • set EMBED_ALLOWED_ORIGINS to explicit trusted origins if you use the embed route
  • keep development-only settings such as code_reloader and live reload disabled in production

Current backend defaults:

  • /api/* can be protected with Authorization: Bearer <token> or X-API-Key: <token> when API_AUTH_TOKEN is set
  • iframe embedding defaults to frame-ancestors 'self'
  • browser routes use CSRF protection and Phoenix secure browser headers
  • STT socket connections prefer the signed Phoenix session user id over client-supplied socket params

Current limitations:

  • there is no user authentication / authorization layer yet, so this is not a multi-tenant internet-facing SaaS security model
  • rate limiting is not implemented yet; use a reverse proxy or API gateway if you expose public endpoints
  • provider keys are process-wide application secrets, not per-user secrets

LLM Providers

Provider Type API key Notes
ClawRouter Sidecar No x402 router, local gateway
OpenAI Cloud API Yes GPT-family models
Anthropic Cloud API Yes Claude models
Groq Cloud API Yes Fast hosted inference
OpenRouter Aggregator Yes Broad model catalog
Ollama Local No Local models through Ollama

Adding a Provider

Create a module that implements LlmChat.LLM.Provider:

defmodule LlmChat.LLM.Adapters.MyProvider do
  @behaviour LlmChat.LLM.Provider

  def name, do: "My Provider"
  def available?, do: true
  def models, do: ["model-1"]

  def stream(messages, on_token) do
    on_token.("hello")
    :ok
  end
end

Then register it in LlmChat.LLM.Adapters.all/0.

Routing Strategies

  • :configured uses the configured adapter first
  • :fallback tries providers in order
  • :fastest chooses the first provider that passes availability checks

LLM Queue

LLM jobs are routed through a bounded in-process queue so the app can accept many incoming chat requests without spawning unbounded concurrent generations.

Recommended environment variables:

export LLM_MAX_CONCURRENCY=32
export LLM_MAX_QUEUE=10000
export LLM_QUEUE_TIMEOUT_MS=30000
export LLM_JOB_TIMEOUT_MS=120000

Failure modes:

  • llm_queue_full when the pending queue is already full
  • llm_queue_timeout when a request waited too long before a worker slot opened
  • llm_job_timeout when a running provider call exceeded the configured runtime

Speech-to-Text

The STT layer is routed through LlmChat.STT, which keeps Phoenix Channel and LiveView code independent from concrete engines.

Built-in backends:

  • :local_whisper for a local whisper.cpp HTTP server
  • :faster_whisper for a local faster-whisper HTTP server
  • :native_whisper for native Nx/Bumblebee inference inside Elixir

Recommended environment variables:

export STT_BACKEND=faster_whisper
export STT_LANGUAGE=ru

# Native Nx/Bumblebee backend
export STT_NATIVE_MODEL=openai/whisper-medium

# whisper.cpp-style HTTP backend
export STT_LOCAL_URL=http://localhost:9000
export STT_LOCAL_TIMEOUT_MS=300000

# faster-whisper HTTP backend
export STT_FASTER_URL=http://localhost:9100
export STT_FASTER_TIMEOUT_MS=300000

Legacy *_WHISPER_* environment variables are still accepted as fallbacks, but the STT_* names are now preferred.

Capability Endpoint

GET /api/stt/providers returns the available STT backends, transport type, and capability flags.

Example response:

{
  "configured": "faster_whisper",
  "providers": [
    {
      "id": "faster_whisper",
      "name": "HTTP faster backend",
      "available": true,
      "transport": "http",
      "streaming_segments": true,
      "supports_language": true,
      "configured": true
    }
  ]
}

RAG

The current implementation uses an ETS-backed in-process store in LlmChat.RAG.Store.

Each LLM request can:

  1. search relevant documents
  2. inject matched context into the system prompt
  3. generate a response using that context

For production, swap the store for pgvector, Qdrant, or another external system.

File Layout

llm_chat/lib/llm_chat/
├── application.ex
├── endpoint.ex
├── router.ex
├── stt.ex
├── controllers/
│   ├── providers_controller.ex
│   ├── stt_controller.ex
│   └── stt_providers_controller.ex
├── llm/
│   ├── adapters.ex
│   ├── adapters/
│   │   ├── anthropic.ex
│   │   ├── claw_router.ex
│   │   ├── groq.ex
│   │   ├── ollama.ex
│   │   ├── open_router.ex
│   │   └── openai.ex
│   ├── pipeline.ex
│   ├── provider.ex
│   └── router.ex
├── live/
│   ├── chat_live_components.ex
│   ├── chat_live_index.ex
│   ├── chat_live_room.ex
│   ├── chat_live_shared.ex
│   └── chat_live_widget.ex
└── rag/
    └── store.ex

Embedding

See ./llm_chat/EMBED.md for the embed script, iframe, and host-page integration details.

Recommended production embed configuration:

export EMBED_ALLOWED_ORIGINS="https://app.example.com"

Load Testing

The repository includes an application-level Elixir load generator:

nix develop -c mix loadtest.chat \
  --connections 10000 \
  --messages-per-connection 5 \
  --max-concurrency 1000 \
  --fake-tokens 32 \
  --fake-delay-ms 0 \
  --rag=false

What it does:

  • switches chat persistence to the in-memory backend for the run
  • uses LlmChat.LLM.Adapters.FakeLoad instead of a real provider
  • creates virtual chat sessions and conversations
  • sends messages through LlmChat.Chat, which persists user and assistant messages and runs the normal LLM pipeline
  • prints throughput, VM memory, process count, and latency percentiles

Important limitation:

  • this is an application-layer stress test, not a browser or LiveView WebSocket protocol benchmark
  • use it to measure pipeline, persistence, task concurrency, and BEAM behavior under large message volume
  • if you need true socket-level benchmarking, add a dedicated Phoenix/WebSocket load client on top

The repository also includes a Phoenix Channel socket-level load generator:

nix develop -c mix loadtest.socket \
  --connections 2000 \
  --messages-per-connection 3 \
  --max-concurrency 500 \
  --port 4100 \
  --fake-tokens 32 \
  --fake-delay-ms 0 \
  --rag=false

This task:

  • starts the Phoenix endpoint on a dedicated port
  • opens raw WebSocket connections to /socket/websocket
  • joins chat_load:* topics through the Phoenix channel protocol
  • sends messages over the socket and waits for async message_result events
  • prints throughput, VM memory, process count, and latency percentiles

Verification

nix develop -c bash -lc 'cd llm_chat && MIX_ENV=test mix test'

License

MIT, see ./LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors