OpenTelemetry extension for LLM observability: track conversations, workflows, and costs
Track conversations and workflows in your LLM applications with automatic context propagation. Built on OpenTelemetry for seamless integration with your existing observability stack.
Not a replacement for OTel auto-instrumentation — works alongside it or standalone.
Key Features:
- 🎯 Conversation Tracking: Automatic multi-turn conversation tracking with
conversation_context - 🔄 Workflow Management: Track complex multi-step AI workflows with
workflow_context - 🎨 Zero-Touch Instrumentation:
@observe()decorator for automatic tracking - 📊 Context Propagation: Thread-safe attribute tracking across nested operations
- 💰 Optional Cost Tracking: Bring your own pricing for cost monitoring
- 🏷️ Span Classification: Filter by type (llm/tool/chain/agent/prompt)
- 🎯 Conversation Tracking: Multi-turn conversations with
gen_ai.conversation.idand turn numbers - 🔄 Workflow Management: Track multi-step AI operations across LLM calls, tools, and retrievals
- 📊 Auto-Context Propagation: Thread-safe context managers that automatically tag all nested operations
- 🎨 Decorator Pattern:
@observe()for zero-touch instrumentation with full input/output/latency tracking - 🔧 SpanProcessor: Automatic context enrichment for all spans in your application
- 🏷️ Span Classification:
gen_ai.l9.span.kindfor filtering (llm/tool/chain/agent/prompt) - 🛠️ Tool/Function Tracking: Enhanced attributes for function calls and tool usage
- ⚡ Performance Metrics: Response times, token counts, and quality scores
- 🌐 Provider Agnostic: Works with OpenAI, Anthropic, Google, Cohere, etc.
- 📏 Standard Attributes: Full OpenTelemetry
gen_ai.*semantic conventions
- 💰 Cost Tracking: Bring your own model pricing for cost monitoring
- 💸 Workflow Costing: Aggregate costs across multi-step operations
This is an EXTENSION, not a replacement:
| Package | Purpose | Approach |
|---|---|---|
OTel GenAIopentelemetry-instrumentation-openai-v2 |
Auto-instrument LLM SDKs | Automatic (monkey-patching) |
Last9 GenAIlast9-genai |
Add conversation/workflow tracking | Context-based enrichment |
You can use:
- Last9 GenAI alone - Full conversation and workflow tracking
- Both together - OTel auto-traces + Last9 adds conversation/workflow context (recommended!)
See Working with OTel Auto-Instrumentation for combined usage.
Basic:
pip install last9-genaiWith OTLP export (recommended):
pip install last9-genai[otlp]Install the latest version directly from GitHub:
# Basic installation
pip install git+https://github.com/last9/python-ai-sdk.git
# With OTLP export
pip install "last9-genai[otlp] @ git+https://github.com/last9/python-ai-sdk.git"
# Install specific version (using tags)
pip install git+https://github.com/last9/python-ai-sdk.git@v1.0.0Add to requirements.txt:
last9-genai @ git+https://github.com/last9/python-ai-sdk.git@v1.0.0Requirements:
- Python 3.10+
opentelemetry-api>=1.20.0opentelemetry-sdk>=1.20.0
Note: The examples below use client to represent your LLM client. Initialize your preferred provider:
# OpenAI
from openai import OpenAI
client = OpenAI()
# Or Anthropic
from anthropic import Anthropic
anthropic_client = Anthropic()
# Or any other provider (Google, Cohere, etc.)The SDK works with any LLM provider - just use your client normally!
Automatically track multi-turn conversations with zero manual instrumentation:
from last9_genai import conversation_context, Last9SpanProcessor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
# Setup tracing with Last9 processor
provider = TracerProvider()
trace.set_tracer_provider(provider)
provider.add_span_processor(Last9SpanProcessor())
# Track conversations automatically - works with any LLM provider
with conversation_context(conversation_id="session_123", user_id="user_456"):
# OpenAI
response1 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Anthropic (same context!)
response2 = anthropic_client.messages.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "How are you?"}]
)
# Both calls automatically have conversation_id = "session_123"!Track complex multi-step AI operations:
from last9_genai import workflow_context
# Track entire workflow with automatic tagging
with workflow_context(workflow_id="rag_search", workflow_type="retrieval"):
# All operations automatically tagged with workflow_id
docs = retrieve_documents(query) # Tagged
context = rerank_documents(docs) # Tagged
response = generate_answer(context) # Tagged
# Full workflow visibility with zero manual instrumentation!
# Nest workflows and conversations
with conversation_context(conversation_id="support_123"):
with workflow_context(workflow_id="order_lookup"):
# Both conversation AND workflow tracked automatically
result = lookup_and_respond()Use @observe() for automatic tracking of everything:
from last9_genai import observe
@observe() # That's it!
def call_llm(prompt: str):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response
# Automatically tracks:
# - Input (prompt)
# - Output (response)
# - Latency (span duration)
# - Context (conversation_id, workflow_id if set)
# Works seamlessly with context managers
with conversation_context(conversation_id="session_456"):
response = call_llm("Explain quantum computing")
# Span automatically has conversation_id!Add cost monitoring by providing model pricing:
from last9_genai import ModelPricing
# Add pricing when creating processor
processor = Last9SpanProcessor(custom_pricing={
"gpt-4o": ModelPricing(input=2.50, output=10.0),
"claude-sonnet-4-5": ModelPricing(input=3.0, output=15.0),
})
# Or with decorator
pricing = {"gpt-4o": ModelPricing(input=2.50, output=10.0)}
@observe(pricing=pricing)
def call_llm(prompt: str):
# Now also tracks cost automatically
return client.chat.completions.create(...)Use @observe() decorator for automatic tracking of input/output, latency, and cost:
from last9_genai import observe, ModelPricing
pricing = {"gpt-4o": ModelPricing(input=2.50, output=10.0)}
@observe(pricing=pricing)
def call_openai(prompt: str):
"""Automatically tracks everything!"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response
# That's it! Automatically tracks:
# - Input (prompt)
# - Output (response)
# - Latency (span duration)
# - Cost (calculated from usage)
# - Metadata (from context)
# Works with context too:
with conversation_context(conversation_id="session_123"):
response = call_openai("Hello!")
# Span automatically has conversation_id!Add tags and categories for better filtering and organization in your observability platform:
from last9_genai import observe
@observe(
tags=["production", "customer_support"],
metadata={
"category": "customer_support", # Appears in Last9 dashboard Category column
"version": "1.0.0",
"priority": "high"
}
)
def handle_support_query(query: str):
"""Categorized LLM call with metadata"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}]
)
return response
# Categories automatically appear in Last9 dashboard:
# - Category column in traces table
# - Category filter dropdown
# - Enhanced trace details
# Use underscores for multi-word categories:
@observe(metadata={"category": "data_analysis"}) # Shows as "data analysis"
def analyze_data(data: str):
return client.chat.completions.create(...)Common categories:
customer_support,conversational_ai,code_assistantdata_analysis,content_generation,summarizationtranslation,research,qa_automation
Recommended: Combine OTel auto-instrumentation with Last9 extensions:
# Step 1: Auto-instrument with OpenTelemetry (standard attributes)
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
OpenAIInstrumentor().instrument()
# Step 2: Add Last9 extensions (cost, workflows)
from last9_genai import Last9GenAI, ModelPricing
l9 = Last9GenAI(custom_pricing={
"gpt-4o": ModelPricing(input=2.50, output=10.0),
})
# Now make LLM calls
from openai import OpenAI
client = OpenAI()
# OTel automatically traces this call (standard attributes)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Last9 adds cost on top of auto-traced span
from opentelemetry import trace
span = trace.get_current_span()
usage = {
"input_tokens": response.usage.prompt_tokens,
"output_tokens": response.usage.completion_tokens,
}
cost = l9.add_llm_cost_attributes(span, "gpt-4o", usage)
print(f"Cost: ${cost.total:.6f}")Result: You get standard OTel attributes (automatic) + Last9 cost/workflow (manual).
Track conversations across multiple turns automatically:
from last9_genai import conversation_context
# Track a complete conversation session
with conversation_context(conversation_id="support_session_456", user_id="user_456"):
# Turn 1
response1 = client.chat.completions.create(
messages=[{"role": "user", "content": "I need help with my order"}]
)
# Turn 2
response2 = client.chat.completions.create(
messages=[
{"role": "user", "content": "I need help with my order"},
{"role": "assistant", "content": response1.choices[0].message.content},
{"role": "user", "content": "Order #12345"}
]
)
# Both calls automatically tagged with:
# - conversation_id = "support_session_456"
# - user_id = "user_456"
# All turns linked together for analysis!Track multi-step AI workflows with automatic tagging:
from last9_genai import workflow_context
# RAG workflow example
with workflow_context(workflow_id="rag_pipeline", workflow_type="retrieval"):
# Step 1: Query expansion (automatically tagged)
expanded_query = expand_query(user_question)
# Step 2: Retrieval (automatically tagged)
documents = vector_search(expanded_query)
# Step 3: Reranking (automatically tagged)
relevant_docs = rerank(documents, user_question)
# Step 4: Generation (automatically tagged)
response = generate_answer(relevant_docs, user_question)
# All 4 steps automatically have:
# - workflow_id = "rag_pipeline"
# - workflow_type = "retrieval"
# Perfect for analyzing bottlenecks and performance!
### Nested Workflows and Conversations
Combine conversation and workflow tracking:
```python
# Track conversation
with conversation_context(conversation_id="user_session_789", user_id="user_789"):
# Inside conversation, track a specific workflow
with workflow_context(workflow_id="product_search", workflow_type="search"):
# Search workflow steps
results = search_products(query)
recommendations = rank_results(results)
# Outside workflow, still in conversation
followup = handle_followup_question()
# Result:
# - search_products and rank_results: both conversation_id AND workflow_id
# - handle_followup_question: only conversation_id
# Perfect granularity for analysis!Track tool calls:
with tracer.start_span("gen_ai.tool.search") as span:
l9.add_tool_attributes(
span,
tool_name="web_search",
tool_type="search",
arguments={"query": "weather"},
result={"temp": 72},
duration_ms=150
)export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp.last9.io:443"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic YOUR_KEY"from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Setup
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter()
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(otlp_exporter)
)from opentelemetry.sdk.trace.export import ConsoleSpanExporter
console_exporter = ConsoleSpanExporter()
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(console_exporter)
)# Track tokens only, skip cost calculation
l9 = Last9GenAI(enable_cost_tracking=False)from last9_genai import WorkflowCostTracker
tracker = WorkflowCostTracker()
l9 = Last9GenAI(workflow_tracker=tracker)gen_ai.system = "openai"
gen_ai.request.model = "gpt-4o"
gen_ai.usage.input_tokens = 150
gen_ai.usage.output_tokens = 250# Cost (when pricing provided)
gen_ai.usage.cost_usd = 0.00225
gen_ai.usage.cost_input_usd = 0.000375
gen_ai.usage.cost_output_usd = 0.0025
# Classification
gen_ai.l9.span.kind = "llm" # or "tool", "prompt"
# Workflow
workflow.id = "customer_support"
workflow.total_cost_usd = 0.015
workflow.llm_calls = 3
# Conversation
gen_ai.conversation.id = "session_123"
gen_ai.conversation.turn_number = 2No default pricing included. You provide pricing for models you use.
- Anthropic: https://www.anthropic.com/pricing
- OpenAI: https://openai.com/api/pricing/
- Google: https://ai.google.dev/pricing
- Community: https://www.llm-prices.com/
All prices in USD per million tokens:
ModelPricing(
input=3.0, # $3 per 1M input tokens
output=15.0 # $15 per 1M output tokens
)Conversion:
- Per-token:
$0.000003→3.0 - Per-1K:
$0.003→3.0
custom_pricing = {
# Anthropic
"claude-opus-4-6": ModelPricing(input=15.0, output=75.0),
"claude-sonnet-4-5": ModelPricing(input=3.0, output=15.0),
"claude-haiku-4-5": ModelPricing(input=0.8, output=4.0),
# OpenAI
"gpt-4o": ModelPricing(input=2.50, output=10.0),
"gpt-4o-mini": ModelPricing(input=0.15, output=0.60),
"o1": ModelPricing(input=15.0, output=60.0),
# Google
"gemini-1.5-pro": ModelPricing(input=1.25, output=10.0),
"gemini-2.0-flash": ModelPricing(input=0.075, output=0.30),
}Azure OpenAI:
custom_pricing = {
"azure/gpt-4o": ModelPricing(input=2.50, output=10.0),
}Self-hosted (free):
custom_pricing = {
"ollama/llama3.1": ModelPricing(input=0.0, output=0.0),
}Fine-tuned:
custom_pricing = {
"ft:gpt-3.5-turbo:org:model:id": ModelPricing(input=12.0, output=16.0),
}See examples/ directory:
Basic Usage:
basic_usage.py- Simple LLM trackingopenai_integration.py- OpenAI SDKanthropic_integration.py- Anthropic SDKlangchain_integration.py- LangChainfastapi_app.py- FastAPI web apptool_integration.py- Function calls
Auto-Tracking (Recommended):
context_tracking.py- Context managers for automatic trackingdecorator_tracking.py- @observe() decorator pattern
Advanced:
conversation_tracking.py- Multi-turn conversations
Contributions welcome! Please:
- Fork the repo
- Create a feature branch
- Add tests
- Submit a PR
MIT License - see LICENSE
- Issues: https://github.com/last9/python-ai-sdk/issues
- Documentation: https://github.com/last9/python-ai-sdk
- Last9: https://last9.io
Built with ❤️ by Last9