GoChat is a **modern, enterprise-ready Go client SDK for Large Language Models (LLMs)**. It provides an exceptionally elegant and type-safe unified interface that completely smooths out the chaotic API differences between OpenAI, Anthropic (Claude), DeepSeek, Qwen, Ollama, and other major cloud providers or local models.
- 🔌 The Ultimate "Write Once, Run Anywhere" (Client Module)
- Completely smooths out the chaotic API structures and streaming parsing differences between OpenAI, Anthropic (Claude), DeepSeek, Qwen, and other models.
- Unified Tool Calling: Define
core.Toolonce, and the framework will automatically "translate" it into the corresponding vendor's tool calling format (such as Anthropic's unique format), enabling seamless switching between underlying models. - Built-in Anti-Fragility Mechanism: Automatically captures HTTP 429 rate limits and network fluctuations, triggering exponential backoff with jitter retry, keeping your service rock-solid.
- 🧠 "Zero-Configuration" Local Vectorization (Embedding) Without Ollama
- Get Rid of Bulky Dependencies: Directly compute Embedding locally based on lightweight ONNX runtime. No need to deploy massive Ollama services or deal with complex Python environments.
- Built-in Geek Downloader: Just one line
embedding.WithBEG("bge-small-zh-v1.5", ""), automatically pulls model shards from remote mirrors when missing locally and loads them ready. - Industrial-Grade Batching: Built-in
BatchProcessorsupports dynamic concurrent batching and automatic Hash cache filtering for identical texts, maximizing CPU computing power.
- 🌊 Making Extremely Complex Logic Elegant (Generic Pipeline)
- Chain independent
Steps like building blocks to elegantly orchestrate complex RAG or Agent reasoning flows. - Strongly-Typed Context Transfer: Thanks to Go 1.24+ generics, you can seamlessly pass custom strongly-typed
structs between Steps. Completely say goodbye to type assertion crashes and typos caused by traditionalmap[string]any. - Programmable Control Flow: Built-in
IfStep,LoopStep, and AOP monitoring hooks (Hooks) give workflows exceptional observability and scheduling domain control.
- Chain independent
go get github.com/DotNetAge/gochat// Use BatchProcessor to optimize vector generation
processor := embedding.NewBatchProcessor(provider, embedding.BatchOptions{
MaxBatchSize: 32,
MaxConcurrent: 4,
})
// Generate vectors and track progress
embeddings, err := processor.ProcessWithProgress(ctx, texts, func(current, total int, err error) bool {
fmt.Printf("Progress: %d/%d\n", current, total)
return true // Return false to cancel task
})p := pipeline.New[*pipeline.State]().
AddStep(steps.NewTemplateStep("User question: {{.query}}", "prompt", "query")).
AddStep(steps.NewGenerateCompletionStep(client, "prompt", "answer", "gpt-4o")).
AddHook(myLogger) // Observe each step execution
state := pipeline.NewState()
state.Set("query", "What is GoChat?")
err := p.Execute(ctx, state)
fmt.Println(state.GetString("answer"))stream, _ := client.ChatStream(ctx, messages, core.WithThinking(0))
defer stream.Close()
for stream.Next() {
ev := stream.Event()
if ev.Type == core.EventThinking {
fmt.Print(ev.Content) // Print reasoning process
} else if ev.Type == core.EventContent {
fmt.Print(ev.Content) // Print final answer
}
}| Provider | Models | Auth Methods |
|---|---|---|
| OpenAI | GPT-4o, o1, o3-mini | API Key |
| Anthropic | Claude 3.5/3.7 | API Key |
| DeepSeek | V3, R1 | API Key |
| Alibaba Qwen | Tongyi Qianwen series | API Key, OAuth2, Device Code |
| Google Gemini | 1.5 Pro/Flash | API Key, OAuth2 |
| Local / ONNX | BGE, Sentence-BERT | Local Execution (No Key Required) |
| Azure OpenAI | Microsoft-deployed models | API Key (Azure format) |
GoChat's embedding package provides complete local embedding model support, achieving efficient text embedding generation through ONNX format models. The local solution requires no API Key, has no network dependency, and can run completely offline, making it ideal for scenarios with strict requirements for data privacy and response latency.
| Provider Type | Model Type | Dimension | Use Case | Features |
|---|---|---|---|---|
| BGEProvider | bge-small-zh-v1.5 | 512 | Chinese Semantic Sim | Efficient Chinese embedding, lightweight |
| BGEProvider | bge-base-zh-v1.5 | 768 | Chinese Semantic Sim | Higher accuracy, balanced performance |
| SentenceBERTProvider | all-MiniLM-L6-v2 | 384 | English Semantic Search | Ultra-fast inference, real-time apps |
| SentenceBERTProvider | all-mpnet-base-v2 | 768 | English Semantic Search | High precision, BERT architecture |
| CLIPProvider | clip-vit-base-patch32 | 512 | Multimodal Image-Text | Text & image bidirectional embedding |
| LocalProvider | bert-base-uncased | 768 | General English Tasks | Standard BERT model |
import "gochat/pkg/embedding"
// Provider interface definition
type Provider interface {
Embed(ctx context.Context, texts []string) ([][]float32, error)
Dimension() int
}
// Multimodal Provider (with image support)
type MultimodalProvider interface {
Provider
EmbedImages(ctx context.Context, images [][]byte) ([][]float32, error)
}Downloader provides functionality for downloading pre-compiled ONNX models from HuggingFace, supporting progress tracking and cache management.
// Create downloader (specify cache directory)
downloader := embedding.NewDownloader("~/.gochat/models")
// View available model list
models := downloader.GetModelInfo()
for _, m := range models {
fmt.Printf("Model: %s | Type: %s | Size: %s\n", m.Name, m.Type, m.Size)
}
// Download progress callback function
callback := func(modelName, fileName string, downloaded, total int64) {
if total > 0 {
percent := float64(downloaded) / float64(total) * 100
fmt.Printf("\r[%s] %s: %.1f%%", modelName, fileName, percent)
}
}
// Download model
modelPath, err := downloader.DownloadModel("bge-small-zh-v1.5", callback)
if err != nil {
log.Fatal(err)
}
fmt.Printf("\nModel downloaded to: %s\n", modelPath)Downloader Core Methods:
| Method | Parameters | Return Value | Description |
|---|---|---|---|
NewDownloader(cacheDir) |
Cache directory path, empty for default | *Downloader |
Create downloader instance |
GetModelInfo() |
None | []DownloadModelInfo |
Return all available model info |
DownloadModel(name, callback) |
Model name, progress callback | (string, error) |
Download specified model |
Supported Model File Download List:
| Model Name | File URL | Est. Size |
|---|---|---|
| bge-small-zh-v1.5 | model_fp16.onnx, model_fp16.onnx_data | ~48MB |
| all-MiniLM-L6-v2 | model_fp16.onnx | ~45.3MB |
| bert-base-uncased | model_fp16.onnx | ~200MB |
| bge-base-zh-v1.5 | model_fp16.onnx | ~100MB |
| clip-vit-base-patch32 | text_model_fp16.onnx, vision_model_fp16.onnx | ~300MB |
| all-mpnet-base-v2 | model_fp16.onnx | ~218MB |
Method 1: Auto-Download and Initialize (Recommended)
// Use BGE model, auto-download (if not cached)
provider, err := embedding.WithBEG("bge-small-zh-v1.5", "")
if err != nil {
log.Fatal(err)
}
// Use BERT model
bertProvider, err := embedding.WithBERT("all-mpnet-base-v2", "")
if err != nil {
log.Fatal(err)
}
// Use CLIP multimodal model
clipProvider, err := embedding.WithCLIP("clip-vit-base-patch32", "")
if err != nil {
log.Fatal(err)
}Method 2: Specify Local Path Initialization
// Already downloaded model, specify path directly
provider, err := embedding.WithBEG("bge-small-zh-v1.5", "/path/to/model")
if err != nil {
log.Fatal(err)
}
// Use factory function to create
provider, err := embedding.NewProvider("/path/to/bge-small-zh-v1.5")
if err != nil {
log.Fatal(err)
}
// Custom config to create LocalProvider
localProvider, err := embedding.New(embedding.Config{
Model: model,
Dimension: 512,
MaxBatchSize: 32,
})Method 3: Create Specific Provider Directly
// Create BGE Provider
bgeProvider, err := embedding.NewBGEProvider("/path/to/bge-model")
// Create Sentence-BERT Provider
sbProvider, err := embedding.NewSentenceBERTProvider("/path/to/sbert-model")
// Create CLIP Provider (with image-text support)
clipProvider, err := embedding.NewCLIPProvider("/path/to/clip-model")Basic Text Embedding Generation
ctx := context.Background()
// Single call
texts := []string{"Hello world", "你好世界"}
embeddings, err := provider.Embed(ctx, texts)
if err != nil {
log.Fatal(err)
}
// Batch call (automatic batching)
largeTextList := make([]string, 1000)
// ... fill texts
embeddings, err := provider.Embed(ctx, largeTextList)
// Get vector dimension
dim := provider.Dimension()
fmt.Printf("Vector dimension: %d\n", dim)
fmt.Printf("Number of vectors generated: %d\n", len(embeddings))CLIP Multimodal Usage (Image-Text Embedding)
clipProvider, err := embedding.WithCLIP("clip-vit-base-patch32", "")
// Text embedding
textEmbeddings, err := clipProvider.Embed(ctx, []string{"a cat", "a dog"})
// Image embedding
imageData, err := os.ReadFile("image.jpg")
imageEmbeddings, err := clipProvider.EmbedImages(ctx, [][]byte{imageData})
// Calculate image-text similarity
similarity := cosineSimilarity(textEmbeddings[0], imageEmbeddings[0])Optimize Batch Processing with BatchProcessor
// Create batch processor
processor := embedding.NewBatchProcessor(provider, embedding.BatchOptions{
MaxBatchSize: 32, // Max texts per batch
MaxConcurrent: 4, // Max concurrent batches
CacheSize: 1000, // LRU cache entries
})
// Simple batch processing
embeddings, err := processor.Process(ctx, texts)
// Batch processing with progress (suitable for large texts)
callback := func(completed, total int) {
fmt.Printf("Progress: %d/%d (%.1f%%)\n", completed, total, float64(completed)/float64(total)*100)
}
embeddings, err := processor.ProcessWithProgress(ctx, largeTextList, callback)1. Batch Processing Optimization
// Adjust batch size based on model and hardware
// GPU: MaxBatchSize = 64-128
// CPU: MaxBatchSize = 16-32
processor := embedding.NewBatchProcessor(provider, embedding.BatchOptions{
MaxBatchSize: 32,
MaxConcurrent: runtime.NumCPU(), // Utilize multi-core
CacheSize: 5000, // Increase cache to reduce redundant computation
})2. Cache Optimization
// Duplicate texts are automatically cached
// First call computes and caches, subsequent calls return directly
texts := []string{"hot query", "hot query", "hot query"} // Only computed once
embeddings, _ := processor.Process(ctx, texts)3. Concurrent Processing
// For large volumes of text, process different batches in parallel
func parallelEmbed(ctx context.Context, provider embedding.Provider, texts []string, workers int) ([][]float32, error) {
chunkSize := (len(texts) + workers - 1) / workers
var wg sync.WaitGroup
results := make([][][]float32, workers)
errors := make([]error, workers)
for i := 0; i < workers; i++ {
wg.Add(1)
go func(idx int) {
defer wg.Done()
start := idx * chunkSize
end := start + chunkSize
if end > len(texts) {
end = len(texts)
}
results[idx], errors[idx] = provider.Embed(ctx, texts[start:end])
}(i)
}
wg.Wait()
// Merge results
var allEmbeddings [][]float32
for _, emb := range results {
allEmbeddings = append(allEmbeddings, emb...)
}
return allEmbeddings, nil
}4. Resource Cleanup
// Close Provider to release resources after use
defer func() {
if provider != nil {
provider.Close()
}
}()Performance Benchmark Reference:
| Model | Hardware | Batch Size | Avg Latency | Throughput |
|---|---|---|---|---|
| bge-small-zh-v1.5 | CPU (4 core) | 32 | ~50ms/batch | ~600 texts/sec |
| all-MiniLM-L6-v2 | CPU (4 core) | 32 | ~30ms/batch | ~1000 texts/sec |
| clip-vit-base-patch32 | CPU (4 core) | 16 | ~80ms/batch | ~200 images/sec |
GoChat adheres to Go's philosophy of minimalism: The core interface core.Client has only two methods, Chat and ChatStream. All personalization features are elegantly extended through Functional Options, ensuring the main interface remains long-term stable and uncontaminated.
This project is open-sourced under the MIT License. PRs are welcome!
Please refer to the docs/ directory for detailed guides, architecture diagrams, and API references: