feat: OpenAI and Anthropic tool-format adapters with middleware (#55, #50, #40) by dgenio · Pull Request #69 · dgenio/agent-kernel

dgenio · 2026-05-13T10:49:19Z

Summary

Adds a vendor-agnostic LLM tool-format adapter layer to the kernel so callers can hand Capability objects to OpenAI or Anthropic clients without writing schema-translation glue. The middleware classes also route the vendor's tool-call objects back through the full kernel pipeline (grant → invoke → firewall → trace), preserving every weaver-spec invariant (I-01 firewall mediation, I-02 audited authorization, I-06 per-principal tokens).

The kernel previously required users to hand-build JSON Schema by hand from allowed_fields (which is an output-redaction control, not an input schema), hand-translate between Capability and each vendor's tool shape, and stitch the call/result loop through grant_capability / invoke themselves. The Weaver-spec ecosystem markets the kernel as a security layer between LLMs and tools — but without a drop-in pipeline integration, every consumer had to write the same ~200 lines of boilerplate, and the most common pattern (allowed_fields → tool schema) silently advertised wrong information to the LLM.

What changed

File	Change
`src/agent_kernel/adapters/__init__.py`	New — public adapter exports: `OpenAIMiddleware`, `AnthropicMiddleware`, `BaseToolMiddleware`, event types.
`src/agent_kernel/adapters/_base.py`	New (459 lines) — `BaseToolMiddleware` (hook registration + dispatch, request/grant/invoke flow, error-as-result conversion), `ToolCallEvent` / `ToolResultEvent` / `PreparedCall` dataclasses, pydantic-driven schema generation (`build_input_schema`, `normalize_for_openai_strict`, `validate_input`), namespace helpers (`make_namespace_safe_name`, `restore_namespace`), canonical `frame_to_payload` / `error_to_payload`.
`src/agent_kernel/adapters/openai.py`	New (355 lines) — `OpenAIMiddleware` plus the public `capabilities_to_tools`, `tool_call_to_request`, `format_result` helpers. Supports both Responses API (default) and Chat Completions; auto-detects input shape per call regardless of configured output format. Dotted capability IDs ↔ `namespace__function` form with explicit collision rejection.
`src/agent_kernel/adapters/anthropic.py`	New (270 lines) — `AnthropicMiddleware` plus matching helpers. Per-capability and middleware-default `cache_control` support. Preserves dotted capability IDs (Anthropic accepts `.` in tool names).
`src/agent_kernel/models.py`	Adds `ToolHints` dataclass (`cache_control`, `strict`) and three optional fields on `Capability`: `parameters_model: type[pydantic.BaseModel] \| None`, `parameters_schema: dict \| None`, `tool_hints: ToolHints \| None`. All default to `None`; no existing call site needs to change.
`src/agent_kernel/kernel.py`	Adds `Kernel.list_capabilities()` accessor — used by the adapters; generally useful for tooling that needs to enumerate the registry without keyword search.
`src/agent_kernel/errors.py`	New `AdapterParseError(AgentKernelError)` raised by adapter parse and validation helpers (replaces bare `ValueError` per `AGENTS.md`). Also covers capability-ID validation (e.g. IDs containing the reserved `__` namespace separator).
`src/agent_kernel/__init__.py`	Top-level re-exports for the two middleware classes, `ToolHints`, and `AdapterParseError`.
`tests/test_adapters.py`	New (1096 lines, 60 tests) — schema conversion (both OpenAI shapes + Anthropic), round-trip preservation, full kernel-pipeline integration, hook ordering (sync + async + mixed), abort semantics, justification injection (batch + per-call + hook), error-as-result coverage (`PolicyDenied`, `CapabilityNotFound`, `DriverError`, `AdapterParseError`, pydantic `ValidationError`), namespace collision rejection, OpenAI strict mode with both default-bearing and `Optional[T]` fields.
`pyproject.toml`	Adds `pydantic>=2` runtime dependency. Justification: schema generation, argument validation, and consistent JSON Schema emission across both vendors. Imported only by the adapters package — kernel code outside `adapters/` does not load pydantic.
`docs/integrations.md`	New "LLM tool-format adapters" section: usage examples for both providers, namespace mapping table (with the collision rejection rule), strict mode (with the pydantic-default-field caveat and `Optional[T]` escape hatch), `cache_control`, hooks, error-as-result.
`docs/architecture.md`	Adapters bullet under Components.
`AGENTS.md`	Dep list updated to `httpx`, `pydantic`.
`CHANGELOG.md`	`[Unreleased]` entries under Added + Changed.

Design decisions

Schema source is parameters_model (pydantic), with parameters_schema as the raw-dict escape hatch. allowed_fields stays as the output-redaction control the firewall already consumes — using it for input schema was the previous foot-gun this PR explicitly removes. Capabilities without either model or schema fall back to a permissive {"type": "object", "additionalProperties": true} so existing capabilities keep working.
Adapter parse failures raise AdapterParseError(AgentKernelError), not bare ValueError. All six raise sites in the adapter parse helpers (_extract_name_and_call_id, _parse_arguments, tool_use_to_request, make_namespace_safe_name) use the new class. The two handle_tool_calls / handle_tool_uses dispatch loops catch AdapterParseError and convert it into a tool-result error — is_error: true for Anthropic, error: true payload for OpenAI — so the surrounding agent loop never crashes on malformed input.
Namespace separator __ is rejected at adapter-emit time. The OpenAI tool name field accepts __ but doesn't accept ., so capability IDs are mapped with __ as the separator. A capability ID that contains __ would collide ambiguously (a__b and a.b would both map to a__b), so make_namespace_safe_name rejects them with a clear AdapterParseError rather than emitting a colliding tool definition.
Two OpenAI shapes are supported, auto-detected on input. Default output is Responses API (function_call_output envelopes, flat tool definitions); opt-in to Chat Completions via format="chat_completions". Input detection works regardless of output format, so handle_tool_calls accepts either shape.
Hooks are sync-or-async, dispatched in registration order. intercept_tool_call runs before kernel invocation and may mutate event.args, inject event.justification (the per-call justification path WRITE/DESTRUCTIVE capabilities need), or set event.aborted = True (the approval-gate path). intercept_tool_result runs after the kernel returns and may replace event.frame. Pre-hook exceptions become tool-result errors (the surrounding loop survives); post-hook exceptions are logged at WARNING and the batch continues.
Per-call tokens, not cached. Each tool call mints a fresh token via grant_capability and never reuses it. Reusing tokens across calls would invite I-06 violations when middleware instances are accidentally shared.
No openai / anthropic SDK runtime dependency. Both vendors accept plain dicts; the adapter is pure dict-in / dict-out. Adding the SDKs would buy IDE autocomplete that callers can get themselves by importing those SDKs at the call site.
OpenAI strict mode is per-capability via ToolHints(strict=True). The adapter normalises the pydantic-emitted schema (forces every property required, sets additionalProperties: false recursively); a documented caveat in docs/integrations.md explains the Optional[T] = None escape hatch for truly-optional fields (which pydantic emits with anyOf + null, accepted by OpenAI strict).
Justification flow has three layers. Batch-level handle_tool_calls(..., justification=""); per-call override via args["_justification"] (popped before the kernel sees the args); hook-injected via event.justification. READ-only batches get away with no justification; WRITE/DESTRUCTIVE batches can supply one through whichever layer fits the agent harness.

Scope

In scope (delivered):

OpenAI Responses + Chat Completions tool formats, with auto-detection
Anthropic Messages tool format with cache_control (per-capability + middleware-default)
Shared BaseToolMiddleware with hooks (sync or async, abort, mutable event)
Pydantic-driven schema generation and input validation
Namespace mapping with explicit collision rejection
AdapterParseError for all adapter parse / validation failures
Kernel.list_capabilities() accessor
Docs (usage, namespace, strict mode caveats, hooks, error handling)

Out of scope (deferred or rejected by design):

openai / anthropic SDK optional extras — both vendors accept dicts; not worth the version-pinning maintenance.
OTel instrumentation of the adapter layer — belongs in OpenTelemetry integration: spans, metrics, and trace export #38.
Refactoring Capability to be a pydantic.BaseModel — too invasive for this PR; would touch dozens of files.
Splitting _base.py into _base.py + _helpers.py to fit AGENTS.md's 300-line guideline — _base.py is 459 lines, openai.py is 355. Existing repo has three modules over budget (per [policy/kernel] Tech debt: decompose policy_dsl.py and broaden dry-run driver test coverage #68); happy to factor this out on request.

Testing

ruff format --check src/ tests/ examples/  →  45 files already formatted
ruff check src/ tests/ examples/           →  All checks passed
mypy src/                                  →  Success: no issues found in 27 source files
pytest -q --cov=agent_kernel               →  367 passed in 5.47s, 96% total coverage
pytest --cov=agent_kernel.adapters         →  60 passed, 98% adapter coverage
PYTHONIOENCODING=utf-8 python examples/{basic_cli,billing_demo,http_driver_demo}.py  → ✓
CI matrix (3.10/3.11/3.12 + weaver-spec conformance stub) → 4/4 pass

The new tests are organised into:

Capability model extensions — backward-compat (defaults are None); ToolHints defaults.
Schema helpers — build_input_schema resolution order, deep-copy isolation, normalize_for_openai_strict recursive enforcement, validate_input pass-through vs. coercion vs. rejection.
Namespace round-trip — including __ collision rejection (test_namespace_rejects_capability_id_with_reserved_separator, test_namespace_collision_surfaces_via_capabilities_to_tools).
OpenAI — both schema shapes, strict mode (including the Optional[T] escape hatch via test_openai_strict_with_optional_field_preserves_nullable), tool_call_to_request for both Chat Completions and Responses, full middleware flow, error paths (PolicyDenied, unknown capability, invalid JSON), hook ordering (sync + async + abort + justification injection), per-call override.
Anthropic — schema preservation of dotted IDs, cache_control precedence, tool_use_to_request, full middleware flow, error paths, hook ordering, parse-error → is_error block.
Shared base — pre-hook exception → tool error, post-hook exception logged, driver error surfacing, pydantic argument validation, frame-to-payload shape.

Risks

Pydantic JSON Schema dialect drift. Pydantic v2 emits Draft 2020-12 schemas, which OpenAI strict mode mostly accepts. normalize_for_openai_strict handles the two reliable gotchas (additionalProperties, required); if pydantic adds a feature OpenAI strict eventually rejects, normalisation falls back to non-strict with a warnings.warn(...) (currently uncovered defensively — no realistic input triggers it).
At-most-once delivery. The middleware never retries kernel invocations. A DriverError (after the kernel's own driver fallback) becomes a tool-result error, not a retried call. WRITE/DESTRUCTIVE callers should already be idempotent at the driver layer.
Hook concurrency. The hook lists are not lock-protected. The documented usage pattern is one middleware instance per principal, hooks registered at setup time — matching the OpenAIMiddleware(kernel, principal) constructor shape. Sharing a middleware across concurrent batches while mutating its hooks would have undefined ordering.

Documentation

docs/integrations.md — new "LLM tool-format adapters" section with end-to-end OpenAI and Anthropic examples, namespace mapping table (and the collision rejection rule), strict mode (and the Optional[T] = None escape hatch caveat), cache_control, hook usage, error-as-tool-result contract.
docs/architecture.md — adapters listed as an architecture component, with a pointer to docs/integrations.md for usage.
AGENTS.md — dep list updated to httpx, pydantic.
CHANGELOG.md — [Unreleased] entries under Added (adapters, Capability fields, ToolHints, Kernel.list_capabilities(), AdapterParseError) and Changed (pydantic runtime dep).

AI agent instruction files reviewed

AGENTS.md — dep list updated; no convention changes.
docs/agent-context/invariants.md — adapters consume Frame post-firewall and route every call through kernel.invoke(), so I-01, I-02, I-06 remain enforced by existing code paths. No change needed.
docs/agent-context/review-checklist.md, lessons-learned.md, workflows.md — no change needed.
.github/copilot-instructions.md, .claude/CLAUDE.md — no change needed.

Checklist

make ci passes locally (fmt → lint → mypy strict → pytest → examples)
CI green on Python 3.10 / 3.11 / 3.12 + weaver-spec conformance stub
Docstrings match the final implementation
No dead code (all new parameters, helpers, and types exercised by tests)
Naming consistent: capability, principal, grant, Frame throughout
Backward-compat: new Capability fields default to None; no existing test required updates
No bare ValueError / KeyError to callers (per AGENTS.md) — adapter parse errors use AdapterParseError
CHANGELOG.md updated under [Unreleased]
Updated canonical docs (docs/architecture.md, docs/integrations.md, AGENTS.md) in the same PR

🤖 Generated with Claude Code

…50, #40) Adds `agent_kernel.adapters` with two drop-in middleware classes that translate Capability objects into vendor tool schemas, route tool calls through the full kernel pipeline (grant → invoke → firewall → trace), and return vendor-shaped tool-result objects. Both share a `BaseToolMiddleware` that owns hook registration, error-as-result conversion, and the canonical Frame → JSON payload shape. OpenAIMiddleware emits Responses-API tools by default (also supports Chat Completions via `format=chat_completions`), with dotted capability IDs mapped to `namespace__function` form and OpenAI `strict` mode opt-in via `Capability.tool_hints`. AnthropicMiddleware emits Anthropic Messages tools with optional `cache_control` (per-capability or middleware default) and preserves dotted capability IDs. Both auto-detect Chat/Responses shape on input regardless of configured output format. Capability gains three optional fields: `parameters_model` (pydantic model used for JSON-Schema generation and input validation), `parameters_schema` (raw JSON Schema escape hatch), and `tool_hints` (ToolHints — vendor flags). All default to None, preserving backward compat. Kernel gains a small `list_capabilities()` accessor. Adds `pydantic>=2` as a runtime dep (justified by the new adapters; only used inside the adapters package). No `openai` / `anthropic` SDK dependency — every adapter function is a pure dict transform. PolicyDenied, CapabilityNotFound, DriverError, argument-validation failures, and hook abort signals all surface as tool-result errors rather than raised exceptions so the LLM can react. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces a new agent_kernel.adapters package providing OpenAI and Anthropic “tool-format” adapters plus middleware that routes vendor tool calls through the kernel’s full pipeline (grant → invoke → firewall → trace), with schema generation/validation support via Pydantic.

Changes:

Added OpenAI + Anthropic adapter modules and a shared BaseToolMiddleware (hooks, dispatch, vendor-shape formatting, schema helpers).
Extended Capability with optional parameters_model, parameters_schema, and tool_hints (ToolHints) to drive tool schemas and optional strict/cache settings.
Added Kernel.list_capabilities() and updated docs/tests/changelog and runtime deps (pydantic>=2).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/test_adapters.py	New test suite covering schema conversion, middleware flow, hooks, aborts, and error-as-result behavior.
src/agent_kernel/models.py	Adds `ToolHints` and new optional `Capability` fields for adapter schema/validation/hints.
src/agent_kernel/kernel.py	Adds `Kernel.list_capabilities()` to enumerate registered capabilities.
src/agent_kernel/adapters/_base.py	New shared middleware base, hook/event types, schema helpers, payload helpers, namespace helpers.
src/agent_kernel/adapters/openai.py	New OpenAI tool schema conversion + middleware supporting Responses + Chat Completions formats.
src/agent_kernel/adapters/anthropic.py	New Anthropic tool schema conversion + middleware with optional `cache_control`.
src/agent_kernel/adapters/init.py	Public exports for adapter layer.
src/agent_kernel/init.py	Re-exports middlewares and `ToolHints` at top level.
pyproject.toml	Adds runtime dependency on `pydantic>=2`.
docs/integrations.md	Adds “LLM tool-format adapters” documentation and usage examples.
docs/architecture.md	Documents adapters as an architecture component.
AGENTS.md	Updates minimal dependency list to include `pydantic`.
CHANGELOG.md	Adds `[Unreleased]` entries describing the new adapter feature set and dependency change.

Comments suppressed due to low confidence (2)

src/agent_kernel/adapters/openai.py:197

Same as above: _parse_arguments raises ValueError for invalid argument types/JSON. For consistency with the repo’s error-contract rule in AGENTS.md, map these parse failures to a custom AgentKernelError subclass so callers can reliably catch agent-kernel errors (and so exception types are part of the contract).

    if not isinstance(raw, str):
        raise ValueError(
            f"OpenAI tool_call 'arguments' must be a JSON string or dict, got {type(raw).__name__}."
        )

src/agent_kernel/adapters/anthropic.py:128

Same issue here: raising ValueError for non-dict input violates the repo’s “no bare ValueError to callers” rule. If you add a custom adapter parse/validation exception, use it consistently for all adapter-facing shape errors.

    if raw_input is None:
        raw_input = {}
    if not isinstance(raw_input, dict):
        raise ValueError(
            f"Anthropic tool_use 'input' must be an object (got {type(raw_input).__name__})."
        )

…e namespace collisions Addresses Copilot review feedback on PR #69: 1. Adds AdapterParseError(AgentKernelError) in errors.py. The OpenAI and Anthropic adapter parse helpers (tool_call_to_request, tool_use_to_request, _extract_name_and_call_id, _parse_arguments) previously raised bare ValueError on malformed input, violating AGENTS.md's "no bare ValueError/KeyError to callers" rule. All 6 raise sites now raise AdapterParseError; the two handle_tool_calls / handle_tool_uses dispatch loops catch the new exception type and convert it to a tool-result error as before. 2. make_namespace_safe_name now rejects capability IDs containing the reserved "__" separator at adapter-emit time. Previously "a__b" and "a.b" would both map to OpenAI tool name "a__b", a silent collision; the new AdapterParseError surfaces the issue with a clear remediation message. capabilities_to_tools and OpenAIMiddleware.get_tools() propagate the error. 3. Fixes a docstring contradiction in OpenAIMiddleware.handle_tool_calls: the Args section claimed non-function items were "passed through unchanged", but the Returns section and the code both said/did "skip". Docstring now consistently reflects the skip behavior, with an explanation of why (caller stitches results back into the conversation alongside the original items). Test changes: - Updated 6 pytest.raises(ValueError, ...) sites to AdapterParseError. - Added 2 new tests covering the namespace collision rejection path. - Total: 366 tests pass, 96% coverage (was 364, 96%). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er adapter error placeholder Addresses three findings from the audit pass on PR #69: 1. docs(integrations): adds a "Strict mode caveats" subsection to the OpenAI strict-mode docs. Explains that the normaliser forces every property into `required` (per OpenAI's contract) and that pydantic fields with non-`None` defaults are not exempt. Shows the `Optional[T] = None` pattern as the escape hatch for truly-optional fields under strict mode — pydantic emits `anyOf`+`null` which OpenAI strict accepts. 2. test(adapters): adds `test_openai_strict_with_optional_field_preserves_nullable` asserting (a) the Optional field lands in the strict-mode `required` list, and (b) the `anyOf`+`null` representation survives normalisation. Locks the documented strict-mode escape hatch into CI. 3. fix(adapters): replaces the `"<unknown>"` placeholder used in parse-error tool-result payloads with `"(unresolved)"` in both `openai.py` and `anthropic.py`. Angle-bracket sentinels read as HTML or magic placeholders to some LLMs; the new label is plain text. The audit-flagged module-size delta (`_base.py` 459, `openai.py` 355) and three minor nits were deferred per audit response choices (recommended defaults). 367 tests pass, 96% total coverage. The two existing tests that asserted error-result behaviour don't check the `capability_id` field directly, so no test updates were needed for the placeholder swap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 13, 2026 10:49

Copilot started reviewing on behalf of dgenio May 13, 2026 10:49 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread src/agent_kernel/adapters/_base.py

Comment thread src/agent_kernel/adapters/openai.py

Comment thread src/agent_kernel/adapters/openai.py

Comment thread src/agent_kernel/adapters/anthropic.py

dgenio and others added 2 commits May 14, 2026 07:37

dgenio merged commit 68f5691 into main May 14, 2026
4 checks passed

dgenio deleted the feat/llm-adapter-middleware branch May 14, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenAI and Anthropic tool-format adapters with middleware (#55, #50, #40)#69

feat: OpenAI and Anthropic tool-format adapters with middleware (#55, #50, #40)#69
dgenio merged 3 commits into
mainfrom
feat/llm-adapter-middleware

dgenio commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgenio commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Design decisions

Scope

Testing

Risks

Documentation

AI agent instruction files reviewed

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dgenio commented May 13, 2026 •

edited

Loading