Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added
- Cross-invocation context budget manager (`BudgetManager`) tracks cumulative token usage across
multiple `Kernel.invoke()` calls within a session. When attached to a `Kernel` via the new
`budget_manager` keyword argument, the kernel reserves a budget slice before each invocation
and reconciles actual frame-payload usage afterwards. As the remaining budget shrinks the
requested `response_mode` is auto-escalated to a more aggressive tier (> 50% remaining keeps
the caller's mode; 20–50% downgrades `raw` to `table`; 5–20% floors at `summary`; < 5% forces
`handle_only`). `Kernel.invoke(..., dry_run=True)` now also reports `budget_remaining` and the
escalated `response_mode` when a manager is configured. The `BudgetManager` is optional and
off by default — existing kernels are unchanged. (#44)
- `TokenCounter` protocol and `default_token_counter` (character-based `len(json.dumps(...))//4`
approximation) provide pluggable token counting without runtime dependencies. A new optional
`[tiktoken]` extra is reserved for callers that want to plug in `tiktoken`-based counting.
- `BudgetExhausted(AgentKernelError)` raised by `BudgetManager.allocate()` (and by
`Kernel.invoke()` before driver execution) when the cumulative session budget is fully spent.
- `BudgetConfigError(AgentKernelError)` raised by `BudgetManager` for invalid configuration or
validation failures (non-positive budgets, negative allocate/record/release amounts), replacing
bare `ValueError` so callers can catch budget mistakes via the `AgentKernelError` hierarchy
per `AGENTS.md` ("never raise bare ValueError to callers").
- New public exports: `BudgetManager`, `BudgetExhausted`, `BudgetConfigError`, `TokenCounter`,
`default_token_counter`, and `Kernel.budget` accessor property.
- LLM tool-format adapters and middleware (`agent_kernel.adapters`): `OpenAIMiddleware` (OpenAI
Responses API + Chat Completions, auto-detected on input) and `AnthropicMiddleware` (Anthropic
Messages with `cache_control` support). Both translate `Capability` objects to vendor tool
Expand Down
56 changes: 56 additions & 0 deletions docs/context_firewall.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,59 @@ Summaries are produced deterministically:
- **dict** → key list + per-value type/value
- **string** → truncated to 500 chars
- **other** → repr() truncated to 200 chars

## Cross-invocation budgets

The per-invocation `Budgets` above cap a single Frame. A separate
`BudgetManager` tracks cumulative token usage *across* invocations within a
session. It is optional — if you don't attach one, kernel behavior is
unchanged.

```python
from agent_kernel import BudgetManager, Kernel

manager = BudgetManager(total_budget=100_000)
kernel = Kernel(registry, budget_manager=manager)
```

Per `invoke()` the kernel:

1. Reserves a slice of the remaining budget (default 4,000 tokens). If the
budget is empty, `BudgetExhausted` is raised before the driver runs.
2. Consults `manager.suggested_mode(requested)` to escalate the requested
`response_mode` to a more aggressive tier as the remaining budget shrinks.
3. After the firewall produces a Frame, counts the actual tokens in the
LLM-facing payload and reconciles them against the reservation.

Escalation table:

| Budget remaining | Suggested mode (effective `response_mode`) |
|-----------------:|------------------------------------------------|
| > 50% | Caller's requested mode (no change) |
| 20% – 50% | `table` (when caller requested `raw`) |
| 5% – 20% (≥ 5%) | `summary` (floor — never *relaxes* to `table`) |
| < 5% | `handle_only` |

Boundaries land in the more-conservative tier — exactly 50% remaining
downgrades `raw` to `table`, exactly 20% floors at `summary`, and only when
remaining drops *below* 5% does `handle_only` take over.

`Kernel.invoke(..., dry_run=True)` mirrors the escalation and reports
`budget_remaining` in the returned `DryRunResult`, so callers can preview
what their next live invocation would actually return.

Plug a different token counter (for example, a `tiktoken`-based one) via the
`TokenCounter` protocol:

```python
import tiktoken # pip install weaver-kernel[tiktoken]
enc = tiktoken.encoding_for_model("gpt-4o")

def tiktoken_counter(value):
return len(enc.encode(str(value)))

manager = BudgetManager(total_budget=128_000, token_counter=tiktoken_counter)
```

The default counter (`default_token_counter`) is a character-based
`len(json.dumps(value)) // 4` approximation with no extra dependencies.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ policy = [
"pyyaml>=6.0",
"tomli>=2.0; python_version<'3.11'",
]
tiktoken = ["tiktoken>=0.6"]

[tool.hatch.build.targets.wheel]
packages = ["src/agent_kernel"]
Expand Down
14 changes: 12 additions & 2 deletions src/agent_kernel/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

Firewall::

from agent_kernel import Firewall, Budgets
from agent_kernel import Firewall, Budgets, BudgetManager

Handles & traces::

Expand All @@ -35,6 +35,7 @@
AgentKernelError,
TokenExpired, TokenInvalid, TokenScopeError,
PolicyDenied, PolicyConfigError, DriverError, FirewallError,
BudgetExhausted, BudgetConfigError,
CapabilityNotFound, HandleNotFound, HandleExpired,
)
"""
Expand All @@ -48,6 +49,8 @@
from .errors import (
AdapterParseError,
AgentKernelError,
BudgetConfigError,
BudgetExhausted,
CapabilityAlreadyRegistered,
CapabilityNotFound,
DriverError,
Expand All @@ -61,7 +64,9 @@
TokenRevoked,
TokenScopeError,
)
from .firewall.budget_manager import BudgetManager
from .firewall.budgets import Budgets
from .firewall.token_counting import TokenCounter, default_token_counter
from .firewall.transform import Firewall
from .handles import HandleStore
from .kernel import Kernel
Expand Down Expand Up @@ -125,6 +130,8 @@
# errors
"AdapterParseError",
"AgentKernelError",
"BudgetConfigError",
"BudgetExhausted",
"CapabilityAlreadyRegistered",
"CapabilityNotFound",
"DriverError",
Expand Down Expand Up @@ -156,8 +163,11 @@
"MCPDriver",
"make_billing_driver",
# firewall
"Firewall",
"BudgetManager",
"Budgets",
"Firewall",
"TokenCounter",
"default_token_counter",
# stores
"HandleStore",
"TraceStore",
Expand Down
21 changes: 21 additions & 0 deletions src/agent_kernel/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,27 @@ class FirewallError(AgentKernelError):
"""Raised when the context firewall cannot transform a raw result."""


class BudgetExhausted(AgentKernelError):
"""Raised when a :class:`~agent_kernel.firewall.budgets.BudgetManager` has
no remaining cross-invocation context budget.

Distinct from :class:`FirewallError`: this error fires *before* the
firewall transforms data, signalling that the caller has consumed the
entire session-level context budget. The current invocation never runs
the driver.
"""


class BudgetConfigError(AgentKernelError):
"""Raised when a :class:`~agent_kernel.firewall.budgets.BudgetManager` is
constructed with invalid parameters, or asked to allocate/record/release
a negative amount.

Used in place of bare :class:`ValueError` so callers can catch budget
configuration mistakes without swallowing unrelated stdlib errors.
"""


# ── Adapter errors ────────────────────────────────────────────────────────────


Expand Down
12 changes: 11 additions & 1 deletion src/agent_kernel/firewall/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
"""Firewall sub-package exports."""

from .budget_manager import BudgetManager
from .budgets import Budgets
from .redaction import redact
from .summarize import summarize
from .token_counting import TokenCounter, default_token_counter
from .transform import Firewall

__all__ = ["Budgets", "Firewall", "redact", "summarize"]
__all__ = [
"BudgetManager",
"Budgets",
"Firewall",
"TokenCounter",
"default_token_counter",
"redact",
"summarize",
]
Loading
Loading