Add cross-invocation budget manager (issue #44)#70
Open
dgenio wants to merge 2 commits into
Open
Conversation
Adds an optional BudgetManager that tracks cumulative token usage across multiple Kernel.invoke() calls within a session. When attached via the new Kernel(budget_manager=...) keyword argument, the kernel reserves a budget slice before driver execution and reconciles the actual frame payload size afterwards. As the remaining budget shrinks the requested response_mode auto-escalates to a more aggressive tier (>50% remaining keeps the caller's mode; 20-50% downgrades raw to table; 5-20% floors at summary; <5% forces handle_only). BudgetExhausted is raised before the driver runs once the budget is spent. The manager is optional and off by default — kernels constructed without one behave identically to today. DryRunResult now reports the live budget_remaining and the escalated response_mode so callers can preview their next invocation. The new TokenCounter protocol lets callers plug in tiktoken or any other counter; the default is a chars/4 JSON-based approximation with no extra dependencies. A new optional [tiktoken] extra is reserved for the tiktoken-based counter. Honours the existing weaver-spec invariants: every invocation still flows through the firewall (I-01) and produces an ActionTrace (I-02); the admin-only raw gate is preserved and applied before escalation.
There was a problem hiding this comment.
Pull request overview
Adds a cross-invocation context budget feature to agent-kernel by introducing a session-level BudgetManager that tracks cumulative token usage across multiple Kernel.invoke() calls and escalates response_mode as remaining budget shrinks.
Changes:
- Introduces
BudgetManager,TokenCounter, anddefault_token_counterfor cumulative token budgeting and pluggable token counting. - Integrates budget reservation/escalation/reconciliation into
Kernel.invoke()and exposesKernel.budget. - Updates public exports, docs, changelog, and adds tests for budgeting behavior and kernel integration.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_kernel.py | Adds kernel-level integration tests for cross-invocation budgeting, escalation, dry-run behavior, and reservation release on driver failure. |
| tests/test_firewall.py | Adds unit tests for token counting and BudgetManager allocation/recording/escalation behavior. |
| src/agent_kernel/kernel.py | Wires BudgetManager into invoke() (reserve before drivers, escalate mode, record usage after firewall) and adds Kernel.budget. |
| src/agent_kernel/firewall/budgets.py | Expands budgets module with TokenCounter, default_token_counter, and the new BudgetManager. |
| src/agent_kernel/firewall/init.py | Re-exports budget manager and token counting APIs. |
| src/agent_kernel/errors.py | Adds BudgetExhausted error type. |
| src/agent_kernel/init.py | Exports BudgetManager, BudgetExhausted, and token counting APIs at top level. |
| pyproject.toml | Adds optional [tiktoken] extra. |
| docs/context_firewall.md | Documents cross-invocation budgeting and the escalation table. |
| CHANGELOG.md | Adds an entry describing the new budgeting feature and exports. |
Comment on lines
+147
to
+150
| if total_budget <= 0: | ||
| raise ValueError("total_budget must be positive") | ||
| if default_request <= 0: | ||
| raise ValueError("default_request must be positive") |
Comment on lines
1
to
14
| """Budgets, token counting, and cross-invocation budget management. | ||
|
|
||
| Canonical definition of :class:`Budgets`. Re-exported via | ||
| ``agent_kernel.firewall`` and the top-level ``agent_kernel`` package. | ||
| This module provides three things: | ||
|
|
||
| - :class:`Budgets` — per-invocation firewall budget caps (row/field/char/depth). | ||
| - :class:`TokenCounter` — pluggable protocol for approximating token cost of a | ||
| value (default: ``len(json.dumps(...)) // 4``). | ||
| - :class:`BudgetManager` — cumulative session-level budget tracker that | ||
| records token usage across multiple :meth:`~agent_kernel.Kernel.invoke` | ||
| calls and suggests response-mode escalation as the remaining budget shrinks. | ||
|
|
||
| The :class:`Budgets` dataclass is unchanged from earlier versions; the new | ||
| :class:`BudgetManager` is the implementation of issue #44. | ||
| """ |
Comment on lines
+333
to
+337
| effective_mode: ResponseMode = response_mode | ||
| reserved_tokens: int | None = None | ||
| if self._budget_manager is not None: | ||
| reserved_tokens = await self._budget_manager.allocate() | ||
| effective_mode = self._budget_manager.suggested_mode(response_mode) |
Comment on lines
+413
to
+427
| # ── Reconcile cumulative budget against the actual frame payload ────── | ||
| if self._budget_manager is not None and reserved_tokens is not None: | ||
| actual_tokens = self._budget_manager.count_tokens(_frame_payload(frame)) | ||
| await self._budget_manager.record_usage(actual_tokens, reserved=reserved_tokens) |
Comment on lines
+329
to
+339
| def test_budget_manager_rejects_non_positive_total() -> None: | ||
| with pytest.raises(ValueError, match="total_budget must be positive"): | ||
| BudgetManager(total_budget=0) | ||
| with pytest.raises(ValueError, match="total_budget must be positive"): | ||
| BudgetManager(total_budget=-1) | ||
|
|
||
|
|
||
| def test_budget_manager_rejects_non_positive_default_request() -> None: | ||
| with pytest.raises(ValueError, match="default_request must be positive"): | ||
| BudgetManager(total_budget=100, default_request=0) | ||
|
|
Five fixes from the Copilot review:
1. Bare ValueError on BudgetManager validation violated AGENTS.md
("never raise bare ValueError to callers"). Replaced with a new
BudgetConfigError(AgentKernelError); updated tests.
2. firewall/budgets.py exceeded the ≤300 line guideline. Split into:
- budgets.py (28 lines, original Budgets dataclass only)
- token_counting.py (41 lines, TokenCounter + default_token_counter)
- budget_manager.py (275 lines, BudgetManager + helpers)
Public imports unchanged; everything re-exported via firewall/__init__.
3. invoke() did not mirror the Firewall's admin-only raw gate. A
non-admin requesting raw kept effective_mode == "raw", which made
the kernel skip handle creation even though the Firewall would then
downgrade to summary — yielding a summary frame without a handle.
The kernel now applies the same raw → summary downgrade before the
budget escalation and handle-creation decision. Added a regression
test covering the case.
4. A Firewall exception after a budget reservation permanently leaked
the reserved tokens. Wrapped the firewall transform + reconciliation
in try/finally that releases the reservation if record_usage never
ran. Added a regression test using a stub failing Firewall.
5. Updated firewall tests to assert BudgetConfigError instead of
ValueError, and verified BudgetConfigError is a subclass of
AgentKernelError.
make ci: lint clean, mypy clean, 403 tests pass (was 400; +3 from
this change), all examples run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements cross-invocation context budget tracking via a new
BudgetManagerclass that monitors cumulative token usage across multipleKernel.invoke()calls within a session. When attached to a kernel, the manager automatically escalates the requestedresponse_modeto a more aggressive tier as remaining budget shrinks, preventing unbounded context consumption.Key Changes
New
BudgetManagerclass (src/agent_kernel/firewall/budgets.py):allocate()to reserve budget before an invocation andrecord_usage()to reconcile actual consumptionsuggested_mode()escalation table: > 50% remaining keeps requested mode; 20–50% downgradesrawtotable; 5–20% floors atsummary; < 5% forceshandle_onlyasyncio.Lockfor concurrent invocationsBudgetExhaustedwhen no budget remainsToken counting protocol (
TokenCounter):default_token_counter) uses character-based approximation (len(json.dumps(...)) // 4) with no external dependenciestiktoken-based) via constructorKernel integration (
src/agent_kernel/kernel.py):budget_managerparameter inKernel.__init__()Kernel.invoke()now allocates budget before driver execution, escalates response mode based on remaining budget, and records actual frame-payload usage after firewall processingKernel.invoke(..., dry_run=True)mirrors escalation and reportsbudget_remaininginDryRunResultKernel.budgetproperty exposes the manager (orNoneif not configured)New error type (
src/agent_kernel/errors.py):BudgetExhaustedraised when session budget is exhausted before driver executionPublic API exports:
BudgetManager,TokenCounter,default_token_counterexported fromagent_kernel.firewalland top-levelagent_kernelpackageBudgetExhaustedexported from top-levelagent_kernelpackageDocumentation & tests:
docs/context_firewall.mdwith escalation table and usage examplesCHANGELOG.mdwith feature summaryNotable Implementation Details
_BudgetState) is kept separate fromBudgetManagerto maintain__slots__compatibility while holding anasyncio.Lockraw; exactly 20% floors atsummaryBudgetManagerbehave identically to earlier versionstiktokenextra added topyproject.tomlfor users wanting more accurate token countinghttps://claude.ai/code/session_011ebp9VP1kEYRtrXeQd5y7w