Skip to content

Add cross-invocation budget manager (issue #44)#70

Open
dgenio wants to merge 2 commits into
mainfrom
claude/triage-issues-WowaN
Open

Add cross-invocation budget manager (issue #44)#70
dgenio wants to merge 2 commits into
mainfrom
claude/triage-issues-WowaN

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 15, 2026

Summary

Implements cross-invocation context budget tracking via a new BudgetManager class that monitors cumulative token usage across multiple Kernel.invoke() calls within a session. When attached to a kernel, the manager automatically escalates the requested response_mode to a more aggressive tier as remaining budget shrinks, preventing unbounded context consumption.

Key Changes

  • New BudgetManager class (src/agent_kernel/firewall/budgets.py):

    • Tracks total, used, and reserved token budgets across invocations
    • Provides allocate() to reserve budget before an invocation and record_usage() to reconcile actual consumption
    • Implements suggested_mode() escalation table: > 50% remaining keeps requested mode; 20–50% downgrades raw to table; 5–20% floors at summary; < 5% forces handle_only
    • Thread-safe via internal asyncio.Lock for concurrent invocations
    • Raises BudgetExhausted when no budget remains
  • Token counting protocol (TokenCounter):

    • Pluggable protocol for approximating token cost of arbitrary values
    • Default implementation (default_token_counter) uses character-based approximation (len(json.dumps(...)) // 4) with no external dependencies
    • Allows custom counters (e.g., tiktoken-based) via constructor
  • Kernel integration (src/agent_kernel/kernel.py):

    • New optional budget_manager parameter in Kernel.__init__()
    • Kernel.invoke() now allocates budget before driver execution, escalates response mode based on remaining budget, and records actual frame-payload usage after firewall processing
    • Releases reserved budget if driver fails before firewall runs
    • Kernel.invoke(..., dry_run=True) mirrors escalation and reports budget_remaining in DryRunResult
    • New Kernel.budget property exposes the manager (or None if not configured)
  • New error type (src/agent_kernel/errors.py):

    • BudgetExhausted raised when session budget is exhausted before driver execution
  • Public API exports:

    • BudgetManager, TokenCounter, default_token_counter exported from agent_kernel.firewall and top-level agent_kernel package
    • BudgetExhausted exported from top-level agent_kernel package
  • Documentation & tests:

    • Updated docs/context_firewall.md with escalation table and usage examples
    • Comprehensive test suite covering allocation, recording, escalation boundaries, custom counters, and kernel integration
    • Updated CHANGELOG.md with feature summary

Notable Implementation Details

  • Budget state (_BudgetState) is kept separate from BudgetManager to maintain __slots__ compatibility while holding an asyncio.Lock
  • Escalation boundaries are strict-less-than: exactly 50% remaining sits in the 20–50% bucket and downgrades raw; exactly 20% floors at summary
  • Token counting only includes LLM-facing payload (facts, table rows, or raw data), not kernel bookkeeping (provenance, action IDs, handle IDs)
  • Backward compatible: kernels without a BudgetManager behave identically to earlier versions
  • Optional tiktoken extra added to pyproject.toml for users wanting more accurate token counting

https://claude.ai/code/session_011ebp9VP1kEYRtrXeQd5y7w

Adds an optional BudgetManager that tracks cumulative token usage across
multiple Kernel.invoke() calls within a session. When attached via the
new Kernel(budget_manager=...) keyword argument, the kernel reserves a
budget slice before driver execution and reconciles the actual frame
payload size afterwards. As the remaining budget shrinks the requested
response_mode auto-escalates to a more aggressive tier (>50% remaining
keeps the caller's mode; 20-50% downgrades raw to table; 5-20% floors at
summary; <5% forces handle_only). BudgetExhausted is raised before the
driver runs once the budget is spent.

The manager is optional and off by default — kernels constructed without
one behave identically to today. DryRunResult now reports the live
budget_remaining and the escalated response_mode so callers can preview
their next invocation. The new TokenCounter protocol lets callers plug
in tiktoken or any other counter; the default is a chars/4 JSON-based
approximation with no extra dependencies. A new optional [tiktoken]
extra is reserved for the tiktoken-based counter.

Honours the existing weaver-spec invariants: every invocation still
flows through the firewall (I-01) and produces an ActionTrace (I-02);
the admin-only raw gate is preserved and applied before escalation.
Copilot AI review requested due to automatic review settings May 15, 2026 12:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a cross-invocation context budget feature to agent-kernel by introducing a session-level BudgetManager that tracks cumulative token usage across multiple Kernel.invoke() calls and escalates response_mode as remaining budget shrinks.

Changes:

  • Introduces BudgetManager, TokenCounter, and default_token_counter for cumulative token budgeting and pluggable token counting.
  • Integrates budget reservation/escalation/reconciliation into Kernel.invoke() and exposes Kernel.budget.
  • Updates public exports, docs, changelog, and adds tests for budgeting behavior and kernel integration.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_kernel.py Adds kernel-level integration tests for cross-invocation budgeting, escalation, dry-run behavior, and reservation release on driver failure.
tests/test_firewall.py Adds unit tests for token counting and BudgetManager allocation/recording/escalation behavior.
src/agent_kernel/kernel.py Wires BudgetManager into invoke() (reserve before drivers, escalate mode, record usage after firewall) and adds Kernel.budget.
src/agent_kernel/firewall/budgets.py Expands budgets module with TokenCounter, default_token_counter, and the new BudgetManager.
src/agent_kernel/firewall/init.py Re-exports budget manager and token counting APIs.
src/agent_kernel/errors.py Adds BudgetExhausted error type.
src/agent_kernel/init.py Exports BudgetManager, BudgetExhausted, and token counting APIs at top level.
pyproject.toml Adds optional [tiktoken] extra.
docs/context_firewall.md Documents cross-invocation budgeting and the escalation table.
CHANGELOG.md Adds an entry describing the new budgeting feature and exports.

Comment thread src/agent_kernel/firewall/budgets.py Outdated
Comment on lines +147 to +150
if total_budget <= 0:
raise ValueError("total_budget must be positive")
if default_request <= 0:
raise ValueError("default_request must be positive")
Comment thread src/agent_kernel/firewall/budgets.py Outdated
Comment on lines 1 to 14
"""Budgets, token counting, and cross-invocation budget management.

Canonical definition of :class:`Budgets`. Re-exported via
``agent_kernel.firewall`` and the top-level ``agent_kernel`` package.
This module provides three things:

- :class:`Budgets` — per-invocation firewall budget caps (row/field/char/depth).
- :class:`TokenCounter` — pluggable protocol for approximating token cost of a
value (default: ``len(json.dumps(...)) // 4``).
- :class:`BudgetManager` — cumulative session-level budget tracker that
records token usage across multiple :meth:`~agent_kernel.Kernel.invoke`
calls and suggests response-mode escalation as the remaining budget shrinks.

The :class:`Budgets` dataclass is unchanged from earlier versions; the new
:class:`BudgetManager` is the implementation of issue #44.
"""
Comment thread src/agent_kernel/kernel.py Outdated
Comment on lines +333 to +337
effective_mode: ResponseMode = response_mode
reserved_tokens: int | None = None
if self._budget_manager is not None:
reserved_tokens = await self._budget_manager.allocate()
effective_mode = self._budget_manager.suggested_mode(response_mode)
Comment thread src/agent_kernel/kernel.py Outdated
Comment on lines +413 to +427
# ── Reconcile cumulative budget against the actual frame payload ──────
if self._budget_manager is not None and reserved_tokens is not None:
actual_tokens = self._budget_manager.count_tokens(_frame_payload(frame))
await self._budget_manager.record_usage(actual_tokens, reserved=reserved_tokens)
Comment thread tests/test_firewall.py
Comment on lines +329 to +339
def test_budget_manager_rejects_non_positive_total() -> None:
with pytest.raises(ValueError, match="total_budget must be positive"):
BudgetManager(total_budget=0)
with pytest.raises(ValueError, match="total_budget must be positive"):
BudgetManager(total_budget=-1)


def test_budget_manager_rejects_non_positive_default_request() -> None:
with pytest.raises(ValueError, match="default_request must be positive"):
BudgetManager(total_budget=100, default_request=0)

Five fixes from the Copilot review:

1. Bare ValueError on BudgetManager validation violated AGENTS.md
   ("never raise bare ValueError to callers"). Replaced with a new
   BudgetConfigError(AgentKernelError); updated tests.

2. firewall/budgets.py exceeded the ≤300 line guideline. Split into:
   - budgets.py (28 lines, original Budgets dataclass only)
   - token_counting.py (41 lines, TokenCounter + default_token_counter)
   - budget_manager.py (275 lines, BudgetManager + helpers)
   Public imports unchanged; everything re-exported via firewall/__init__.

3. invoke() did not mirror the Firewall's admin-only raw gate. A
   non-admin requesting raw kept effective_mode == "raw", which made
   the kernel skip handle creation even though the Firewall would then
   downgrade to summary — yielding a summary frame without a handle.
   The kernel now applies the same raw → summary downgrade before the
   budget escalation and handle-creation decision. Added a regression
   test covering the case.

4. A Firewall exception after a budget reservation permanently leaked
   the reserved tokens. Wrapped the firewall transform + reconciliation
   in try/finally that releases the reservation if record_usage never
   ran. Added a regression test using a stub failing Firewall.

5. Updated firewall tests to assert BudgetConfigError instead of
   ValueError, and verified BudgetConfigError is a subclass of
   AgentKernelError.

make ci: lint clean, mypy clean, 403 tests pass (was 400; +3 from
this change), all examples run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants