Skip to content

svetstoykov/TokenGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

136 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TokenGuard

Token budget management for LLM agent loops.

NuGet .NET License: MIT


TokenGuard keeps your agent loop conversation inside ConversationContext. That object is the source of truth for the session. Before each model call, TokenGuard reads that history, builds a provider-ready snapshot, and compacts only that snapshot when needed.

// conversationContext is source of truth for this loop.
// System prompt lives there with every other message.
conversationContext.SetSystemPrompt("You are a careful coding assistant.");

// Add user turn to same stored conversation history.
conversationContext.AddUserMessage("Fix this, make no mistake.");

// Build next provider request from that history.
// TokenGuard may compact this snapshot to fit budget.
// Stored history inside conversationContext does not change.
var prepared = await conversationContext.PrepareAsync(cancellationToken);

// Send only prepared snapshot to provider.
var input = prepared.Messages.ForOpenAI();
var response = await chatClient.CompleteChatAsync(input, cancellationToken: cancellationToken);

You keep appending system, user, assistant, and tool messages to conversationContext. Everything happens inside that object. PrepareAsync() returns a PrepareResult describing what should go to the model right now.


What it does

  • Tracks token growth across the full turn sequence — user, assistant, tool, system, and pinned messages
  • Masks stale tool results using a sliding-window strategy when the conversation crosses a configurable soft threshold
  • Summarizes old history with your LLM when masking alone is not enough — collapses older turns into a compact summary message while keeping a recent tail verbatim
  • Falls back to emergency truncation as a last resort — drops oldest unpinned turn groups from the prepared payload while preserving pinned messages and the newest active tail
  • Pins durable context that survives all compaction stages: system prompts, task constraints, repository rules, and any message you need to keep unchanged
  • Stays provider-agnostic in core, with adapter helpers for OpenAI and Anthropic
  • Integrates in minutes via AddConversationContext(...) and a standard DI factory

Benchmark

TokenGuard was benchmarked with Codexplorer across 20 real repository-analysis tasks from samples/Codexplorer.Automation/src/tasks/initial-corpus.json. The corpus spans small, medium, and large tasks, with observed runs ranging from roughly 30 to 100+ turns.

All 20 tasks completed successfully. TokenGuard cut cumulative prompt volume by 87.4% and prevented every context-window failure.

Benchmark setup Value
Workload 20 Codexplorer tasks across mixed difficulty levels
Session length Roughly 30-100+ turns observed
Model openai/gpt-5.4-nano
Context budget 20,000 tokens
Soft threshold 16,000 tokens (80%)
Hard cap 20,000 tokens
Total turns 1,324
Without TokenGuard With TokenGuard
Cumulative prompt tokens 128,058,079 16,158,357
Tokens saved 111,899,722
Reduction 87.4%
Successful turns 1,269 / 1,324 (95.8%)
CompactionInsufficient turns 55 / 1,324 (4.2%)
CannotCompact turns High risk on long runs 0

Install

dotnet add package TokenGuard.Core
dotnet add package TokenGuard.Extensions.OpenAI      # or Anthropic

Quick start

1. Register at startup

services.AddConversationContext(builder => builder
    .WithMaxTokens(25_000)
    .WithCompactionThreshold(0.80));

Emergency truncation is on by default at 1.0. It fires only at the absolute token limit and acts as a last-resort safety net after the normal compaction pipeline has already run.

Override with WithEmergencyThreshold(0.95) to trigger earlier, or call WithoutEmergencyThreshold() to disable it entirely.

Multiple named profiles work too:

services.AddConversationContext("analysis", builder => builder
    .WithMaxTokens(200_000)
    .WithCompactionThreshold(0.75));

Sliding-window masking is always active. Add provider-backed summarization through the provider extension packages:

services.AddConversationContext(builder => builder
    .WithMaxTokens(25_000)
    .WithSlidingWindowOptions(new SlidingWindowOptions(windowSize: 12))
    .UseLlmSummarization(chatClient));
services.AddConversationContext(builder => builder
    .WithMaxTokens(25_000)
    .UseLlmSummarization(anthropicClient, "claude-3-7-sonnet-latest"));

2. Create a context per conversation

using var conversationContext = serviceProvider
    .GetRequiredService<IConversationContextFactory>()
    .Create();

Configuration is singleton-scoped. Each Create() call returns an independent stateful context, safe to use across concurrent requests. Use Create("analysis") when you want a named profile.

3. Run the loop

using TokenGuard.Core.Enums;
using TokenGuard.Extensions.OpenAI;

var factory = serviceProvider.GetRequiredService<IConversationContextFactory>();

using var conversationContext = factory.Create();

conversationContext.SetSystemPrompt("You are a precise coding assistant.");
conversationContext.AddPinnedMessage(MessageRole.User, "Repository root is /workspace/project.");
conversationContext.AddUserMessage("Summarize the failing tests.");

while (true)
{
    var prepared = await conversationContext.PrepareAsync(cancellationToken);

    if (prepared.Outcome == PrepareOutcome.CannotCompact)
        throw new InvalidOperationException(prepared.BudgetFailureReason);

    var response = await chatClient.CompleteChatAsync(
        prepared.Messages.ForOpenAI(),
        chatOptions,
        cancellationToken);

    conversationContext.RecordModelResponse(
        response.ResponseSegments(),
        response.InputTokens());

    if (response.ToolCalls.Count == 0)
        break;

    foreach (var toolCall in response.ToolCalls)
    {
        var result = toolExecutor.Execute(toolCall);
        conversationContext.RecordToolResult(toolCall.Id, toolCall.FunctionName, result);
    }
}

PrepareAsync() returns a PrepareResult, not just a message list. PrepareResult.Messages is the prepared snapshot to send to the provider. ConversationContext.History remains unchanged.


PrepareResult

PrepareAsync() gives you the prepared message list plus metadata about what happened during preparation.

Property Meaning
Messages Prepared message list to send to the provider
Outcome Ready, Compacted, CompactionInsufficient, or CannotCompact
TokensBeforeCompaction Estimated total before any compaction or truncation ran
TokensAfterCompaction Estimated total of Messages after preparation completed
MessagesCompacted Count of messages replaced or dropped during this call
MessagesDropped Count of messages removed specifically by emergency truncation
BudgetFailureReason Diagnostic text for over-budget outcomes

Ready and Compacted are healthy outcomes. CompactionInsufficient means TokenGuard reduced the payload but it still exceeds the configured limit plus any allowed overrun tolerance. CannotCompact means the remaining preserved content is already too large and the call should not be attempted.


Pinned messages

Some context needs to survive the whole session — task constraints, repository layout, coding standards.

conversationContext.SetSystemPrompt("You are a senior Go engineer.");
conversationContext.AddPinnedMessage(MessageRole.User, "All file paths must be relative to /workspace.");

Pinned messages are never masked, never summarized, and never dropped by emergency truncation. They are removed from the compactable slice before compaction, then reinserted at their original positions in the prepared output. They still count against the budget.


How compaction works

Want architecture detail and trade-offs? Read How TokenGuard Thinks About Context.

Three ordered tiers:

1. Observation masking. The sliding-window strategy walks backward through compactable history and masks older ToolResultContent payloads outside the protected tail. Recent messages stay intact and structure is preserved.

2. LLM summarization (opt-in — register with UseLlmSummarization(...)). If masking still leaves the compactable history over budget, TokenGuard asks your LLM to collapse the older prefix into one summary message. The recent tail stays verbatim. Internally, the summarization stage caches checkpoints so it can reuse or promote prior summaries instead of regenerating them from scratch every turn.

3. Emergency truncation (on by default, opt-out with WithoutEmergencyThreshold()). If the prepared request is still above the emergency trigger after the normal compaction stages, TokenGuard drops the oldest eligible unpinned turn groups from the prepared payload. It preserves pinned messages, summary messages, and the newest irreducible tail.


Compaction statuses

PrepareAsync() can return two over-budget statuses after compaction work has already been attempted:

CompactionInsufficient

Meaning: TokenGuard compacted and, if configured, also tried emergency truncation, but the prepared request still exceeds MaxTokens + OverrunToleranceTokens.

Recommended approach: Reduce large tool-call arguments, tool outputs, or assistant payloads in the active tail; enable LLM summarization if it is not already enabled; split the task into smaller exchanges; or increase the configured budget only when the target provider actually supports a larger context window.

CannotCompact

Meaning: The prepared request cannot fit because the remaining preserved content already exceeds the allowed budget and no further messages could be compacted or dropped safely.

Recommended approach: Stop the exchange and reshape the input. Shorten or unpin oversized preserved content, split a large user request or tool payload into smaller pieces, move bulky artifacts out of the live prompt, or switch to a model with a larger real context window.


LLM summarization

When masking alone is not enough, TokenGuard can replace older history with a single compact summary. The newest tail stays verbatim. The summary is inserted as a normal MessageRole.Model message with CompactionState.Summarized.

Register it with one extra call on your builder:

// OpenAI — model is inferred from the ChatClient
builder.UseLlmSummarization(chatClient);

// Anthropic — model must be specified explicitly
builder.UseLlmSummarization(anthropicClient, "claude-3-7-sonnet-latest");

Defaults keep the last 5 messages verbatim and bound the summary budget to 2,048-4,096 tokens. Override with LlmSummarizationOptions:

builder.UseLlmSummarization(chatClient, new LlmSummarizationOptions(
    windowSize: 5,
    minSummaryTokens: 1024,
    maxSummaryTokens: 2048));
Option What it controls Default
WindowSize How many newest compactable messages stay verbatim 5
MinSummaryTokens Minimum remaining summary budget before the first summarization call is made 2,048
MaxSummaryTokens Maximum target budget forwarded to the summarizer 4,096

Only one provider per builder. Registering both OpenAI and Anthropic on the same builder throws at startup.


Provider adapters

The core has no provider dependency. Adapters handle conversion in both directions.

OpenAI

var prepared = await conversationContext.PrepareAsync(cancellationToken);
var messages = prepared.Messages.ForOpenAI();

var response = await chatClient.CompleteChatAsync(messages, chatOptions, cancellationToken);
conversationContext.RecordModelResponse(response.ResponseSegments(), response.InputTokens());

Anthropic

var prepared = await conversationContext.PrepareAsync(cancellationToken);
var (messages, systemPrompt) = prepared.Messages.ForAnthropic();

// Attach both to your Anthropic request.

ForOpenAI() validates tool-call/tool-result structure and throws if the prepared history would produce orphaned tool calls. ForAnthropic() returns a tuple because Anthropic carries system content separately from the normal message list. After the Anthropic call completes, record the response with RecordModelResponse(response.ResponseSegments(), response.InputTokens()).


Without DI

If you're not using a container, construct a factory directly:

var factory = new ConversationContextFactory(
    new ConversationConfigBuilder()
        .WithMaxTokens(25_000)
        .WithCompactionThreshold(0.80)
        .Build());

using var context = factory.Create();

DI is the recommended path. Public factory is the manual fallback when you do not want a container.


Repository layout

src/
  TokenGuard.Core                     core abstractions, message model, compaction pipeline
  TokenGuard.Extensions.OpenAI        OpenAI message conversion and response mapping
  TokenGuard.Extensions.Anthropic     Anthropic message conversion and response mapping

samples/
  Codexplorer                         repository-analysis sample
  Codexplorer.Automation              benchmark automation and corpus runner

tests/
  TokenGuard.Tests                    unit tests
  TokenGuard.IntegrationTests         cross-component coverage

docs/                                supporting notes and documentation
ai/skills/                           shared agent workflow guidance

Build and test

dotnet build TokenGuard.sln --nologo
dotnet test TokenGuard.sln --no-restore --nologo

TokenGuard.sln includes the core packages, tests, and Codexplorer samples. If you only want the interactive sample:

dotnet build ./samples/Codexplorer/src/Codexplorer.csproj --nologo

Requirements

  • .NET SDK 10.0+
  • LLM provider API key for live samples
  • macOS, Linux, or Windows

Current status

What is current:

  • sliding-window observation masking is implemented and always part of the built-in pipeline
  • emergency truncation is implemented and defaults to 1.0 as a last-resort safety net
  • LLM summarization is implemented for OpenAI and Anthropic via UseLlmSummarization(...)
  • summary checkpoint reuse and promotion are implemented inside the summarization strategy
  • pinned messages survive all compaction stages
  • DI registration via AddConversationContext(...) and factory-based creation is implemented
  • OpenAI and Anthropic adapter helpers are available
  • runtime recording flow is available through SetSystemPrompt(...), AddPinnedMessage(...), AddUserMessage(...), PrepareAsync(...), RecordModelResponse(...), and RecordToolResult(...)

What remains planned:

  • broader multi-strategy pipeline expansion beyond current masking + summarization + emergency fallback

About

TokenGuard is a .NET library that wraps your agent's message list, monitoring token usage before each LLM call and after each tool result, applying pluggable compaction strategies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages