Skip to content

Python: New Feature: Pluggable token-efficient serialization for agent communication to reduce context overhead #13876

@makroumi

Description

@makroumi

Summary

Semantic Kernel serializes agent messages, function call results, and chat history as JSON by default.
At scale this creates a measurable and expensive problem that compounds with pipeline complexity.

~44% of tokens in typical SK agent payloads are pure JSON syntax overhead before any reasoning begins.

The Problem

Semantic Kernel's function calling and agent patterns are particularly exposed because:

  1. Chat history grows as JSON with every turn
  2. Function call results serialize back with repeated key names across every invocation
  3. Multi-agent patterns pass full JSON context between every kernel invocation
  4. No validation catches broken function statesbefore they reach the LLM

At 10M agent loops on GPT-4o:
~$59K spent on syntax noise. Not intelligence.

Proposed Feature

A pluggable SerializerInterface in SK core that allows token-efficient serialization as an opt-in replacement for JSON.

I built ULMEN specifically for this problem.

Benchmarks on NVIDIA Tesla T4 production hardware:

Image

Beyond compression, ULMEN adds a Semantic Firewall that validates agent state transitions before they reach the LLM:

  • Rejects orphaned function calls
  • Catches invalid step transitions
  • Validates enum states
  • Raises structured errors vs silent failures

Silent function call failures passing through JSON undetected are a leading cause of SK agent hallucinations that are difficult to diagnose.

Proposed API

Non-breaking. Fully opt-in.

Option 1: Kernel level

kernel = Kernel()
kernel.add_serializer(UlmenSerializer())

Option 2: Per invocation

result = await kernel.invoke(
function,
arguments,
serializer="ulmen"
)

Option 3: Chat completion level

chat_completion = AzureChatCompletion(
service_id="default",
serializer="ulmen"
)

Implementation Notes

ULMEN is:

  • Drop-in Python/Rust library
  • No schema compilation required
  • Pure Python fallback if Rust unavailable
  • Byte-identical output Python vs Rust
  • BSL license, free under $10M revenue
  • 1,364 tests, 100% statement coverage

Reproducible Benchmarks

Live notebook - verify on your own data:
github.com/makroumi/ulmen

Questions For Maintainers

  1. Is there an existing serialization interface in SK core to hook into?
  2. Would this fit better as a middleware layer?
  3. Which SK patterns would have highest impact to prioritize - chat history, function calls, or agent communication?

Happy to submit a PR once maintainers confirm the preferred integration approach.

References

Similar feature requests filed on:

  • microsoft/autogen (filed today)
  • langchain-ai/langchain (filed today)
  • crewAIInc/crewAI (filed today)
  • run-llama/llama_index (filed today)

The AI engineering community is converging on this problem. ULMEN is the systematic solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    pythonPull requests for the Python Semantic Kerneltriage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions