Python: New Feature: Pluggable token-efficient serialization for agent communication to reduce context overhead

## Summary

Semantic Kernel serializes agent messages, function call results, and chat history as JSON by default. 
At scale this creates a measurable and expensive problem that compounds with pipeline complexity.

~44% of tokens in typical SK agent payloads are pure JSON syntax overhead before any reasoning begins.

## The Problem

Semantic Kernel's function calling and agent patterns are particularly exposed because:

1. Chat history grows as JSON with every turn
2. Function call results serialize back with repeated key names across every invocation
3. Multi-agent patterns pass full JSON context between every kernel invocation
4. No validation catches broken function statesbefore they reach the LLM

At 10M agent loops on GPT-4o:
~$59K spent on syntax noise. Not intelligence.

## Proposed Feature

A pluggable SerializerInterface in SK core that allows token-efficient serialization as an opt-in replacement for JSON.

I built ULMEN specifically for this problem.

Benchmarks on NVIDIA Tesla T4 production hardware:

<img width="1861" height="1281" alt="Image" src="https://github.com/user-attachments/assets/731560c7-bbe0-416b-afbf-922cff6712d4" />

Beyond compression, ULMEN adds a Semantic Firewall that validates agent state transitions before they reach the LLM:

- Rejects orphaned function calls
- Catches invalid step transitions
- Validates enum states
- Raises structured errors vs silent failures

Silent function call failures passing through JSON undetected are a leading cause of SK agent hallucinations that are difficult to diagnose.

## Proposed API

Non-breaking. Fully opt-in.

# Option 1: Kernel level
kernel = Kernel()
kernel.add_serializer(UlmenSerializer())

# Option 2: Per invocation
result = await kernel.invoke(
    function,
    arguments,
    serializer="ulmen"
)

# Option 3: Chat completion level
chat_completion = AzureChatCompletion(
    service_id="default",
    serializer="ulmen"
)

## Implementation Notes

ULMEN is:
- Drop-in Python/Rust library
- No schema compilation required
- Pure Python fallback if Rust unavailable
- Byte-identical output Python vs Rust
- BSL license, free under $10M revenue
- 1,364 tests, 100% statement coverage

## Reproducible Benchmarks

Live notebook - verify on your own data:
github.com/makroumi/ulmen

## Questions For Maintainers

1. Is there an existing serialization interface in SK core to hook into?
2. Would this fit better as a middleware layer?
3. Which SK patterns would have highest impact to prioritize - chat history, function calls, or agent communication?

Happy to submit a PR once maintainers confirm the preferred integration approach.

## References

Similar feature requests filed on:
- microsoft/autogen (filed today)
- langchain-ai/langchain (filed today)
- crewAIInc/crewAI (filed today)
- run-llama/llama_index (filed today)

The AI engineering community is converging on this problem. ULMEN is the systematic solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: New Feature: Pluggable token-efficient serialization for agent communication to reduce context overhead #13876

Summary

The Problem

Proposed Feature

Proposed API

Option 1: Kernel level

Option 2: Per invocation

Option 3: Chat completion level

Implementation Notes

Reproducible Benchmarks

Questions For Maintainers

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python: New Feature: Pluggable token-efficient serialization for agent communication to reduce context overhead #13876

Description

Summary

The Problem

Proposed Feature

Proposed API

Option 1: Kernel level

Option 2: Per invocation

Option 3: Chat completion level

Implementation Notes

Reproducible Benchmarks

Questions For Maintainers

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions