Skip to content

agentuity/llmproxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llmproxy

A Go library for proxying requests to upstream LLM providers with pluggable, composable architecture.

Install

go get github.com/agentuity/llmproxy

Quick Start

Simple Proxy

package main

import (
    "context"
    "io"
    "net/http"

    "github.com/agentuity/llmproxy"
    "github.com/agentuity/llmproxy/interceptors"
    "github.com/agentuity/llmproxy/providers/openai"
)

func main() {
    ctx := context.Background()

    provider, _ := openai.New("sk-your-key")

    proxy := llmproxy.NewProxy(provider,
        llmproxy.WithInterceptor(interceptors.NewLogging(nil)),
    )

    http.HandleFunc("/v1/chat/completions", func(w http.ResponseWriter, r *http.Request) {
        resp, meta, err := proxy.Forward(ctx, r)
        if err != nil {
            http.Error(w, err.Error(), 500)
            return
        }
        defer resp.Body.Close()

        // Response includes token usage
        _ = meta.Usage.PromptTokens
        _ = meta.Usage.CompletionTokens

        io.Copy(w, resp.Body)
    })

    http.ListenAndServe(":8080", nil)
}

AutoRouter (Recommended)

Single endpoint that auto-detects provider and API type:

package main

import (
    "net/http"

    "github.com/agentuity/llmproxy"
    "github.com/agentuity/llmproxy/providers/openai"
    "github.com/agentuity/llmproxy/providers/anthropic"
)

func main() {
    openaiProvider, _ := openai.New("sk-openai-key")
    anthropicProvider, _ := anthropic.New("sk-ant-key")

    router := llmproxy.NewAutoRouter(
        llmproxy.WithAutoRouterFallbackProvider(openaiProvider),
    )
    router.RegisterProvider(openaiProvider)
    router.RegisterProvider(anthropicProvider)

    // Single endpoint handles all providers and APIs
    http.Handle("/", router)
    http.ListenAndServe(":8080", nil)
}

POST to / with any model - provider and API are auto-detected:

# Auto-detect OpenAI from gpt-4 model name
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'

# Auto-detect Anthropic from claude model name  
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

# Auto-detect Responses API from input field
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4o","input":"Hello"}'

Features

  • 9 Provider Implementations: OpenAI, Anthropic, Groq, Fireworks, x.AI, Google AI, AWS Bedrock, Azure OpenAI, OpenAI-compatible base
  • AutoRouter: Single endpoint with automatic provider/API detection
  • Responses API: Full support for OpenAI's Responses API (HTTP streaming and WebSocket mode)
  • WebSocket Mode: Persistent connections for multi-turn Responses API workflows with per-turn billing
  • SSE Streaming: Full streaming support with efficient token usage extraction
  • 8 Built-in Interceptors: Logging, Metrics, Retry, Billing, Tracing (OTel), HeaderBan, AddHeader, PromptCaching
  • Pricing Integration: models.dev adapter with markup support
  • Prompt Caching: prompt caching support for Anthropic, OpenAI, xAI, Fireworks, and Bedrock
  • Raw Body Preservation: Custom JSON fields pass through unchanged

AutoRouter

The AutoRouter provides automatic routing from a single endpoint:

Detection Order

  1. Path-based - /v1/messages → Messages API, /v1/responses → Responses API
  2. Body + Provider - When path is / or unknown:
    • input field → Responses API
    • prompt field → Completions API
    • contents field → GenerateContent API
    • messages + Anthropic → Messages API
    • messages + other → Chat Completions

Provider Detection

  1. X-Provider header - Explicit override
  2. Model prefix - openai/gpt-4 → OpenAI (strips prefix before forwarding)
  3. Model pattern - gpt-* → OpenAI, claude-* → Anthropic, etc.

Examples

# Explicit provider via header
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -H 'X-Provider: anthropic' \
  -d '{"model":"claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

# Provider prefix in model (gets stripped)
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"anthropic/claude-3-opus","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

# Traditional path still works
curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'

Streaming

SSE streaming is fully supported with automatic token usage extraction for billing:

# Streaming with automatic usage extraction
curl -X POST http://localhost:8080/ \
  -H 'Content-Type: application/json' \
  -d '{"model":"gpt-4","stream":true,"messages":[{"role":"user","content":"Hello"}]}'

Key Features:

  • Efficient flushing: Uses http.ResponseController for immediate SSE delivery
  • Token extraction: Extracts usage from streaming responses for billing
  • Auto stream_options: Automatically injects stream_options.include_usage when billing is configured
  • Works with billing: Billing is calculated after stream completes

Example with billing:

adapter, _ := modelsdev.LoadFromURL()
billingCallback := func(r llmproxy.BillingResult) {
    log.Printf("Cost: $%.6f (tokens: %d/%d)", r.TotalCost, r.PromptTokens, r.CompletionTokens)
}

router := llmproxy.NewAutoRouter(
    llmproxy.WithAutoRouterBillingCalculator(llmproxy.NewBillingCalculator(adapter.GetCostLookup(), billingCallback)),
)

WebSocket Mode

The Responses API supports persistent WebSocket connections for multi-turn, tool-call-heavy workflows. WebSocket support is opt-in with a zero-dependency adapter pattern — bring your own WebSocket library.

gorilla/websocket Example

package main

import (
    "context"
    "log"
    "net/http"

    "github.com/agentuity/llmproxy"
    "github.com/agentuity/llmproxy/providers/openai"
    "github.com/gorilla/websocket"
)

// Configure allowed origins for WebSocket upgrades.
var trustedOrigins = []string{"https://myapp.example.com"}

// Thin adapters — gorilla's *Conn already satisfies llmproxy.WSConn

type gorillaUpgrader struct{ websocket.Upgrader }

func (u *gorillaUpgrader) Upgrade(w http.ResponseWriter, r *http.Request, h http.Header) (llmproxy.WSConn, error) {
    conn, err := u.Upgrader.Upgrade(w, r, h)
    return conn, err
}

type gorillaDialer struct{ websocket.Dialer }

func (d *gorillaDialer) DialContext(ctx context.Context, urlStr string, h http.Header) (llmproxy.WSConn, *http.Response, error) {
    conn, resp, err := d.Dialer.DialContext(ctx, urlStr, h)
    return conn, resp, err
}

func main() {
    // In production, validate the Origin header against trusted origins.
    // This example allows all origins for brevity.
    upgrader := websocket.Upgrader{
        CheckOrigin: func(r *http.Request) bool {
            origin := r.Header.Get("Origin")
            for _, allowed := range trustedOrigins {
                if origin == allowed {
                    return true
                }
            }
            return false
        },
    }

    provider, _ := openai.New("sk-your-key")

    router := llmproxy.NewAutoRouter(
        llmproxy.WithAutoRouterFallbackProvider(provider),
        llmproxy.WithAutoRouterWebSocket(
            &gorillaUpgrader{upgrader},
            &gorillaDialer{websocket.Dialer{}},
        ),
        llmproxy.WithAutoRouterWSBillingCallback(func(turn int, meta llmproxy.ResponseMetadata, billing *llmproxy.BillingResult) {
            log.Printf("Turn %d: %d prompt + %d completion tokens",
                turn, meta.Usage.PromptTokens, meta.Usage.CompletionTokens)
        }),
    )
    router.RegisterProvider(provider)

    http.Handle("/", router)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Clients connect with any WebSocket library:

from websocket import create_connection
import json

ws = create_connection("ws://localhost:8080/v1/responses",
    header=["Authorization: Bearer sk-your-key"])

ws.send(json.dumps({
    "type": "response.create",
    "model": "gpt-4o",
    "input": [{"type": "message", "role": "user",
               "content": [{"type": "input_text", "text": "Hello!"}]}],
}))

for msg in ws:
    event = json.loads(msg)
    print(event["type"], event.get("delta", ""))
    if event["type"] == "response.completed":
        break

The proxy handles model prefix stripping, auth header forwarding, usage extraction, and per-turn billing automatically. See DESIGN.md for full protocol details.

Providers

Provider Auth API Format Notes
OpenAI Bearer token Chat completions, Responses, WebSocket HTTP + WebSocket for /v1/responses
Anthropic x-api-key Messages API
Groq Bearer token OpenAI-compatible
Fireworks Bearer token OpenAI-compatible
x.AI Bearer token OpenAI-compatible
Google AI API key query param Gemini generateContent
AWS Bedrock AWS Signature V4 Converse API
Azure OpenAI api-key or Azure AD Chat completions (deployments)

Interceptors

// Logging
llmproxy.WithInterceptor(interceptors.NewLogging(logger))

// Metrics (thread-safe)
metrics := &interceptors.Metrics{}
llmproxy.WithInterceptor(interceptors.NewMetrics(metrics))

// Retry on 429/5xx
llmproxy.WithInterceptor(interceptors.NewRetry(3, time.Second))

// Billing with models.dev pricing
adapter, _ := modelsdev.LoadFromURL()
llmproxy.WithInterceptor(interceptors.NewBilling(adapter.GetCostLookup(), func(r llmproxy.BillingResult) {
    log.Printf("Cost: $%.6f", r.TotalCost)
}))

// OTel tracing
llmproxy.WithInterceptor(interceptors.NewTracing(otelExtractor))

// Strip sensitive headers
llmproxy.WithInterceptor(interceptors.NewResponseHeaderBan("Openai-Organization"))

// Add custom headers
llmproxy.WithInterceptor(interceptors.NewAddResponseHeader(
    interceptors.NewHeader("X-Gateway", "llmproxy"),
))

// Anthropic prompt caching (default 5 min, free)
llmproxy.WithInterceptor(interceptors.NewAnthropicPromptCaching(interceptors.CacheRetentionDefault))

// Anthropic prompt caching with 1h retention (costs more)
llmproxy.WithInterceptor(interceptors.NewAnthropicPromptCaching(interceptors.CacheRetention1h))

// OpenAI prompt caching with explicit cache key
llmproxy.WithInterceptor(interceptors.NewOpenAIPromptCaching(interceptors.CacheRetention24h, "my-cache-key"))

// OpenAI prompt caching with auto-derived key and tenant namespace
llmproxy.WithInterceptor(interceptors.NewOpenAIPromptCachingAuto("tenant-123", interceptors.CacheRetentionDefault))

// xAI/Grok prompt caching (uses x-grok-conv-id header)
llmproxy.WithInterceptor(interceptors.NewXAIPromptCaching("conv-abc123"))

// Fireworks prompt caching (uses x-session-affinity and x-prompt-cache-isolation-key headers)
llmproxy.WithInterceptor(interceptors.NewFireworksPromptCaching("session-123"))

Architecture

The library uses small, focused interfaces that compose into providers:

Parse → Enrich → Resolve → Forward → Extract
  • BodyParser — Extract metadata from request body
  • RequestEnricher — Add auth headers
  • URLResolver — Determine upstream URL
  • ResponseExtractor — Parse response metadata
  • Provider — Composes the above
  • Interceptor — Wrap request/response for cross-cutting concerns

See DESIGN.md for full architecture details.

Example

A complete multi-provider proxy server:

cd examples/basic
go run main.go

Environment variables:

Variable Provider
OPENAI_API_KEY OpenAI
ANTHROPIC_API_KEY Anthropic
GROQ_API_KEY Groq
FIREWORKS_API_KEY Fireworks
XAI_API_KEY x.AI
GOOGLE_AI_API_KEY Google AI
AZURE_OPENAI_RESOURCE Azure OpenAI
AZURE_OPENAI_API_KEY Azure OpenAI
AWS_REGION + AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY AWS Bedrock

License

MIT

About

A simple LLM proxy

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors