Summary
Semantic Kernel serializes agent messages, function call results, and chat history as JSON by default.
At scale this creates a measurable and expensive problem that compounds with pipeline complexity.
~44% of tokens in typical SK agent payloads are pure JSON syntax overhead before any reasoning begins.
The Problem
Semantic Kernel's function calling and agent patterns are particularly exposed because:
- Chat history grows as JSON with every turn
- Function call results serialize back with repeated key names across every invocation
- Multi-agent patterns pass full JSON context between every kernel invocation
- No validation catches broken function statesbefore they reach the LLM
At 10M agent loops on GPT-4o:
~$59K spent on syntax noise. Not intelligence.
Proposed Feature
A pluggable SerializerInterface in SK core that allows token-efficient serialization as an opt-in replacement for JSON.
I built ULMEN specifically for this problem.
Benchmarks on NVIDIA Tesla T4 production hardware:
Beyond compression, ULMEN adds a Semantic Firewall that validates agent state transitions before they reach the LLM:
- Rejects orphaned function calls
- Catches invalid step transitions
- Validates enum states
- Raises structured errors vs silent failures
Silent function call failures passing through JSON undetected are a leading cause of SK agent hallucinations that are difficult to diagnose.
Proposed API
Non-breaking. Fully opt-in.
Option 1: Kernel level
kernel = Kernel()
kernel.add_serializer(UlmenSerializer())
Option 2: Per invocation
result = await kernel.invoke(
function,
arguments,
serializer="ulmen"
)
Option 3: Chat completion level
chat_completion = AzureChatCompletion(
service_id="default",
serializer="ulmen"
)
Implementation Notes
ULMEN is:
- Drop-in Python/Rust library
- No schema compilation required
- Pure Python fallback if Rust unavailable
- Byte-identical output Python vs Rust
- BSL license, free under $10M revenue
- 1,364 tests, 100% statement coverage
Reproducible Benchmarks
Live notebook - verify on your own data:
github.com/makroumi/ulmen
Questions For Maintainers
- Is there an existing serialization interface in SK core to hook into?
- Would this fit better as a middleware layer?
- Which SK patterns would have highest impact to prioritize - chat history, function calls, or agent communication?
Happy to submit a PR once maintainers confirm the preferred integration approach.
References
Similar feature requests filed on:
- microsoft/autogen (filed today)
- langchain-ai/langchain (filed today)
- crewAIInc/crewAI (filed today)
- run-llama/llama_index (filed today)
The AI engineering community is converging on this problem. ULMEN is the systematic solution.
Summary
Semantic Kernel serializes agent messages, function call results, and chat history as JSON by default.
At scale this creates a measurable and expensive problem that compounds with pipeline complexity.
~44% of tokens in typical SK agent payloads are pure JSON syntax overhead before any reasoning begins.
The Problem
Semantic Kernel's function calling and agent patterns are particularly exposed because:
At 10M agent loops on GPT-4o:
~$59K spent on syntax noise. Not intelligence.
Proposed Feature
A pluggable SerializerInterface in SK core that allows token-efficient serialization as an opt-in replacement for JSON.
I built ULMEN specifically for this problem.
Benchmarks on NVIDIA Tesla T4 production hardware:
Beyond compression, ULMEN adds a Semantic Firewall that validates agent state transitions before they reach the LLM:
Silent function call failures passing through JSON undetected are a leading cause of SK agent hallucinations that are difficult to diagnose.
Proposed API
Non-breaking. Fully opt-in.
Option 1: Kernel level
kernel = Kernel()
kernel.add_serializer(UlmenSerializer())
Option 2: Per invocation
result = await kernel.invoke(
function,
arguments,
serializer="ulmen"
)
Option 3: Chat completion level
chat_completion = AzureChatCompletion(
service_id="default",
serializer="ulmen"
)
Implementation Notes
ULMEN is:
Reproducible Benchmarks
Live notebook - verify on your own data:
github.com/makroumi/ulmen
Questions For Maintainers
Happy to submit a PR once maintainers confirm the preferred integration approach.
References
Similar feature requests filed on:
The AI engineering community is converging on this problem. ULMEN is the systematic solution.