Skip to content

feat: OpenAI and Anthropic tool-format adapters with middleware (#55, #50, #40)#69

Merged
dgenio merged 3 commits into
mainfrom
feat/llm-adapter-middleware
May 14, 2026
Merged

feat: OpenAI and Anthropic tool-format adapters with middleware (#55, #50, #40)#69
dgenio merged 3 commits into
mainfrom
feat/llm-adapter-middleware

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 13, 2026

Summary

Closes #55, #50, #40.

Adds a vendor-agnostic LLM tool-format adapter layer to the kernel so callers can hand Capability objects to OpenAI or Anthropic clients without writing schema-translation glue. The middleware classes also route the vendor's tool-call objects back through the full kernel pipeline (grant → invoke → firewall → trace), preserving every weaver-spec invariant (I-01 firewall mediation, I-02 audited authorization, I-06 per-principal tokens).

The kernel previously required users to hand-build JSON Schema by hand from allowed_fields (which is an output-redaction control, not an input schema), hand-translate between Capability and each vendor's tool shape, and stitch the call/result loop through grant_capability / invoke themselves. The Weaver-spec ecosystem markets the kernel as a security layer between LLMs and tools — but without a drop-in pipeline integration, every consumer had to write the same ~200 lines of boilerplate, and the most common pattern (allowed_fields → tool schema) silently advertised wrong information to the LLM.

What changed

File Change
src/agent_kernel/adapters/__init__.py New — public adapter exports: OpenAIMiddleware, AnthropicMiddleware, BaseToolMiddleware, event types.
src/agent_kernel/adapters/_base.py New (459 lines) — BaseToolMiddleware (hook registration + dispatch, request/grant/invoke flow, error-as-result conversion), ToolCallEvent / ToolResultEvent / PreparedCall dataclasses, pydantic-driven schema generation (build_input_schema, normalize_for_openai_strict, validate_input), namespace helpers (make_namespace_safe_name, restore_namespace), canonical frame_to_payload / error_to_payload.
src/agent_kernel/adapters/openai.py New (355 lines) — OpenAIMiddleware plus the public capabilities_to_tools, tool_call_to_request, format_result helpers. Supports both Responses API (default) and Chat Completions; auto-detects input shape per call regardless of configured output format. Dotted capability IDs ↔ namespace__function form with explicit collision rejection.
src/agent_kernel/adapters/anthropic.py New (270 lines) — AnthropicMiddleware plus matching helpers. Per-capability and middleware-default cache_control support. Preserves dotted capability IDs (Anthropic accepts . in tool names).
src/agent_kernel/models.py Adds ToolHints dataclass (cache_control, strict) and three optional fields on Capability: parameters_model: type[pydantic.BaseModel] | None, parameters_schema: dict | None, tool_hints: ToolHints | None. All default to None; no existing call site needs to change.
src/agent_kernel/kernel.py Adds Kernel.list_capabilities() accessor — used by the adapters; generally useful for tooling that needs to enumerate the registry without keyword search.
src/agent_kernel/errors.py New AdapterParseError(AgentKernelError) raised by adapter parse and validation helpers (replaces bare ValueError per AGENTS.md). Also covers capability-ID validation (e.g. IDs containing the reserved __ namespace separator).
src/agent_kernel/__init__.py Top-level re-exports for the two middleware classes, ToolHints, and AdapterParseError.
tests/test_adapters.py New (1096 lines, 60 tests) — schema conversion (both OpenAI shapes + Anthropic), round-trip preservation, full kernel-pipeline integration, hook ordering (sync + async + mixed), abort semantics, justification injection (batch + per-call + hook), error-as-result coverage (PolicyDenied, CapabilityNotFound, DriverError, AdapterParseError, pydantic ValidationError), namespace collision rejection, OpenAI strict mode with both default-bearing and Optional[T] fields.
pyproject.toml Adds pydantic>=2 runtime dependency. Justification: schema generation, argument validation, and consistent JSON Schema emission across both vendors. Imported only by the adapters package — kernel code outside adapters/ does not load pydantic.
docs/integrations.md New "LLM tool-format adapters" section: usage examples for both providers, namespace mapping table (with the collision rejection rule), strict mode (with the pydantic-default-field caveat and Optional[T] escape hatch), cache_control, hooks, error-as-result.
docs/architecture.md Adapters bullet under Components.
AGENTS.md Dep list updated to httpx, pydantic.
CHANGELOG.md [Unreleased] entries under Added + Changed.

Design decisions

  • Schema source is parameters_model (pydantic), with parameters_schema as the raw-dict escape hatch. allowed_fields stays as the output-redaction control the firewall already consumes — using it for input schema was the previous foot-gun this PR explicitly removes. Capabilities without either model or schema fall back to a permissive {"type": "object", "additionalProperties": true} so existing capabilities keep working.
  • Adapter parse failures raise AdapterParseError(AgentKernelError), not bare ValueError. All six raise sites in the adapter parse helpers (_extract_name_and_call_id, _parse_arguments, tool_use_to_request, make_namespace_safe_name) use the new class. The two handle_tool_calls / handle_tool_uses dispatch loops catch AdapterParseError and convert it into a tool-result error — is_error: true for Anthropic, error: true payload for OpenAI — so the surrounding agent loop never crashes on malformed input.
  • Namespace separator __ is rejected at adapter-emit time. The OpenAI tool name field accepts __ but doesn't accept ., so capability IDs are mapped with __ as the separator. A capability ID that contains __ would collide ambiguously (a__b and a.b would both map to a__b), so make_namespace_safe_name rejects them with a clear AdapterParseError rather than emitting a colliding tool definition.
  • Two OpenAI shapes are supported, auto-detected on input. Default output is Responses API (function_call_output envelopes, flat tool definitions); opt-in to Chat Completions via format="chat_completions". Input detection works regardless of output format, so handle_tool_calls accepts either shape.
  • Hooks are sync-or-async, dispatched in registration order. intercept_tool_call runs before kernel invocation and may mutate event.args, inject event.justification (the per-call justification path WRITE/DESTRUCTIVE capabilities need), or set event.aborted = True (the approval-gate path). intercept_tool_result runs after the kernel returns and may replace event.frame. Pre-hook exceptions become tool-result errors (the surrounding loop survives); post-hook exceptions are logged at WARNING and the batch continues.
  • Per-call tokens, not cached. Each tool call mints a fresh token via grant_capability and never reuses it. Reusing tokens across calls would invite I-06 violations when middleware instances are accidentally shared.
  • No openai / anthropic SDK runtime dependency. Both vendors accept plain dicts; the adapter is pure dict-in / dict-out. Adding the SDKs would buy IDE autocomplete that callers can get themselves by importing those SDKs at the call site.
  • OpenAI strict mode is per-capability via ToolHints(strict=True). The adapter normalises the pydantic-emitted schema (forces every property required, sets additionalProperties: false recursively); a documented caveat in docs/integrations.md explains the Optional[T] = None escape hatch for truly-optional fields (which pydantic emits with anyOf + null, accepted by OpenAI strict).
  • Justification flow has three layers. Batch-level handle_tool_calls(..., justification=""); per-call override via args["_justification"] (popped before the kernel sees the args); hook-injected via event.justification. READ-only batches get away with no justification; WRITE/DESTRUCTIVE batches can supply one through whichever layer fits the agent harness.

Scope

In scope (delivered):

  • OpenAI Responses + Chat Completions tool formats, with auto-detection
  • Anthropic Messages tool format with cache_control (per-capability + middleware-default)
  • Shared BaseToolMiddleware with hooks (sync or async, abort, mutable event)
  • Pydantic-driven schema generation and input validation
  • Namespace mapping with explicit collision rejection
  • AdapterParseError for all adapter parse / validation failures
  • Kernel.list_capabilities() accessor
  • Docs (usage, namespace, strict mode caveats, hooks, error handling)

Out of scope (deferred or rejected by design):

Testing

ruff format --check src/ tests/ examples/  →  45 files already formatted
ruff check src/ tests/ examples/           →  All checks passed
mypy src/                                  →  Success: no issues found in 27 source files
pytest -q --cov=agent_kernel               →  367 passed in 5.47s, 96% total coverage
pytest --cov=agent_kernel.adapters         →  60 passed, 98% adapter coverage
PYTHONIOENCODING=utf-8 python examples/{basic_cli,billing_demo,http_driver_demo}.py  → ✓
CI matrix (3.10/3.11/3.12 + weaver-spec conformance stub) → 4/4 pass

The new tests are organised into:

  • Capability model extensions — backward-compat (defaults are None); ToolHints defaults.
  • Schema helpersbuild_input_schema resolution order, deep-copy isolation, normalize_for_openai_strict recursive enforcement, validate_input pass-through vs. coercion vs. rejection.
  • Namespace round-trip — including __ collision rejection (test_namespace_rejects_capability_id_with_reserved_separator, test_namespace_collision_surfaces_via_capabilities_to_tools).
  • OpenAI — both schema shapes, strict mode (including the Optional[T] escape hatch via test_openai_strict_with_optional_field_preserves_nullable), tool_call_to_request for both Chat Completions and Responses, full middleware flow, error paths (PolicyDenied, unknown capability, invalid JSON), hook ordering (sync + async + abort + justification injection), per-call override.
  • Anthropic — schema preservation of dotted IDs, cache_control precedence, tool_use_to_request, full middleware flow, error paths, hook ordering, parse-error → is_error block.
  • Shared base — pre-hook exception → tool error, post-hook exception logged, driver error surfacing, pydantic argument validation, frame-to-payload shape.

Risks

  • Pydantic JSON Schema dialect drift. Pydantic v2 emits Draft 2020-12 schemas, which OpenAI strict mode mostly accepts. normalize_for_openai_strict handles the two reliable gotchas (additionalProperties, required); if pydantic adds a feature OpenAI strict eventually rejects, normalisation falls back to non-strict with a warnings.warn(...) (currently uncovered defensively — no realistic input triggers it).
  • At-most-once delivery. The middleware never retries kernel invocations. A DriverError (after the kernel's own driver fallback) becomes a tool-result error, not a retried call. WRITE/DESTRUCTIVE callers should already be idempotent at the driver layer.
  • Hook concurrency. The hook lists are not lock-protected. The documented usage pattern is one middleware instance per principal, hooks registered at setup time — matching the OpenAIMiddleware(kernel, principal) constructor shape. Sharing a middleware across concurrent batches while mutating its hooks would have undefined ordering.

Documentation

  • docs/integrations.md — new "LLM tool-format adapters" section with end-to-end OpenAI and Anthropic examples, namespace mapping table (and the collision rejection rule), strict mode (and the Optional[T] = None escape hatch caveat), cache_control, hook usage, error-as-tool-result contract.
  • docs/architecture.md — adapters listed as an architecture component, with a pointer to docs/integrations.md for usage.
  • AGENTS.md — dep list updated to httpx, pydantic.
  • CHANGELOG.md[Unreleased] entries under Added (adapters, Capability fields, ToolHints, Kernel.list_capabilities(), AdapterParseError) and Changed (pydantic runtime dep).

AI agent instruction files reviewed

  • AGENTS.md — dep list updated; no convention changes.
  • docs/agent-context/invariants.md — adapters consume Frame post-firewall and route every call through kernel.invoke(), so I-01, I-02, I-06 remain enforced by existing code paths. No change needed.
  • docs/agent-context/review-checklist.md, lessons-learned.md, workflows.md — no change needed.
  • .github/copilot-instructions.md, .claude/CLAUDE.md — no change needed.

Checklist

  • make ci passes locally (fmt → lint → mypy strict → pytest → examples)
  • CI green on Python 3.10 / 3.11 / 3.12 + weaver-spec conformance stub
  • Docstrings match the final implementation
  • No dead code (all new parameters, helpers, and types exercised by tests)
  • Naming consistent: capability, principal, grant, Frame throughout
  • Backward-compat: new Capability fields default to None; no existing test required updates
  • No bare ValueError / KeyError to callers (per AGENTS.md) — adapter parse errors use AdapterParseError
  • CHANGELOG.md updated under [Unreleased]
  • Updated canonical docs (docs/architecture.md, docs/integrations.md, AGENTS.md) in the same PR

🤖 Generated with Claude Code

…50, #40)

Adds `agent_kernel.adapters` with two drop-in middleware classes that
translate Capability objects into vendor tool schemas, route tool calls
through the full kernel pipeline (grant → invoke → firewall → trace), and
return vendor-shaped tool-result objects. Both share a `BaseToolMiddleware`
that owns hook registration, error-as-result conversion, and the canonical
Frame → JSON payload shape.

OpenAIMiddleware emits Responses-API tools by default (also supports Chat
Completions via `format=chat_completions`), with dotted capability IDs
mapped to `namespace__function` form and OpenAI `strict` mode opt-in via
`Capability.tool_hints`. AnthropicMiddleware emits Anthropic Messages tools
with optional `cache_control` (per-capability or middleware default) and
preserves dotted capability IDs. Both auto-detect Chat/Responses shape on
input regardless of configured output format.

Capability gains three optional fields: `parameters_model` (pydantic model
used for JSON-Schema generation and input validation), `parameters_schema`
(raw JSON Schema escape hatch), and `tool_hints` (ToolHints — vendor flags).
All default to None, preserving backward compat. Kernel gains a small
`list_capabilities()` accessor.

Adds `pydantic>=2` as a runtime dep (justified by the new adapters; only
used inside the adapters package). No `openai` / `anthropic` SDK
dependency — every adapter function is a pure dict transform.

PolicyDenied, CapabilityNotFound, DriverError, argument-validation failures,
and hook abort signals all surface as tool-result errors rather than raised
exceptions so the LLM can react.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 13, 2026 10:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new agent_kernel.adapters package providing OpenAI and Anthropic “tool-format” adapters plus middleware that routes vendor tool calls through the kernel’s full pipeline (grant → invoke → firewall → trace), with schema generation/validation support via Pydantic.

Changes:

  • Added OpenAI + Anthropic adapter modules and a shared BaseToolMiddleware (hooks, dispatch, vendor-shape formatting, schema helpers).
  • Extended Capability with optional parameters_model, parameters_schema, and tool_hints (ToolHints) to drive tool schemas and optional strict/cache settings.
  • Added Kernel.list_capabilities() and updated docs/tests/changelog and runtime deps (pydantic>=2).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_adapters.py New test suite covering schema conversion, middleware flow, hooks, aborts, and error-as-result behavior.
src/agent_kernel/models.py Adds ToolHints and new optional Capability fields for adapter schema/validation/hints.
src/agent_kernel/kernel.py Adds Kernel.list_capabilities() to enumerate registered capabilities.
src/agent_kernel/adapters/_base.py New shared middleware base, hook/event types, schema helpers, payload helpers, namespace helpers.
src/agent_kernel/adapters/openai.py New OpenAI tool schema conversion + middleware supporting Responses + Chat Completions formats.
src/agent_kernel/adapters/anthropic.py New Anthropic tool schema conversion + middleware with optional cache_control.
src/agent_kernel/adapters/init.py Public exports for adapter layer.
src/agent_kernel/init.py Re-exports middlewares and ToolHints at top level.
pyproject.toml Adds runtime dependency on pydantic>=2.
docs/integrations.md Adds “LLM tool-format adapters” documentation and usage examples.
docs/architecture.md Documents adapters as an architecture component.
AGENTS.md Updates minimal dependency list to include pydantic.
CHANGELOG.md Adds [Unreleased] entries describing the new adapter feature set and dependency change.
Comments suppressed due to low confidence (2)

src/agent_kernel/adapters/openai.py:197

  • Same as above: _parse_arguments raises ValueError for invalid argument types/JSON. For consistency with the repo’s error-contract rule in AGENTS.md, map these parse failures to a custom AgentKernelError subclass so callers can reliably catch agent-kernel errors (and so exception types are part of the contract).
    if not isinstance(raw, str):
        raise ValueError(
            f"OpenAI tool_call 'arguments' must be a JSON string or dict, got {type(raw).__name__}."
        )

src/agent_kernel/adapters/anthropic.py:128

  • Same issue here: raising ValueError for non-dict input violates the repo’s “no bare ValueError to callers” rule. If you add a custom adapter parse/validation exception, use it consistently for all adapter-facing shape errors.
    if raw_input is None:
        raw_input = {}
    if not isinstance(raw_input, dict):
        raise ValueError(
            f"Anthropic tool_use 'input' must be an object (got {type(raw_input).__name__})."
        )

Comment thread src/agent_kernel/adapters/_base.py
Comment thread src/agent_kernel/adapters/openai.py
Comment thread src/agent_kernel/adapters/openai.py
Comment thread src/agent_kernel/adapters/anthropic.py
dgenio and others added 2 commits May 14, 2026 07:37
…e namespace collisions

Addresses Copilot review feedback on PR #69:

1. Adds AdapterParseError(AgentKernelError) in errors.py. The OpenAI and
   Anthropic adapter parse helpers (tool_call_to_request, tool_use_to_request,
   _extract_name_and_call_id, _parse_arguments) previously raised bare
   ValueError on malformed input, violating AGENTS.md's "no bare
   ValueError/KeyError to callers" rule. All 6 raise sites now raise
   AdapterParseError; the two handle_tool_calls / handle_tool_uses dispatch
   loops catch the new exception type and convert it to a tool-result error
   as before.

2. make_namespace_safe_name now rejects capability IDs containing the
   reserved "__" separator at adapter-emit time. Previously "a__b" and
   "a.b" would both map to OpenAI tool name "a__b", a silent collision; the
   new AdapterParseError surfaces the issue with a clear remediation
   message. capabilities_to_tools and OpenAIMiddleware.get_tools()
   propagate the error.

3. Fixes a docstring contradiction in OpenAIMiddleware.handle_tool_calls:
   the Args section claimed non-function items were "passed through
   unchanged", but the Returns section and the code both said/did "skip".
   Docstring now consistently reflects the skip behavior, with an
   explanation of why (caller stitches results back into the conversation
   alongside the original items).

Test changes:
- Updated 6 pytest.raises(ValueError, ...) sites to AdapterParseError.
- Added 2 new tests covering the namespace collision rejection path.
- Total: 366 tests pass, 96% coverage (was 364, 96%).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er adapter error placeholder

Addresses three findings from the audit pass on PR #69:

1. docs(integrations): adds a "Strict mode caveats" subsection to the OpenAI
   strict-mode docs. Explains that the normaliser forces every property
   into `required` (per OpenAI's contract) and that pydantic fields with
   non-`None` defaults are not exempt. Shows the `Optional[T] = None`
   pattern as the escape hatch for truly-optional fields under strict mode
   — pydantic emits `anyOf`+`null` which OpenAI strict accepts.

2. test(adapters): adds `test_openai_strict_with_optional_field_preserves_nullable`
   asserting (a) the Optional field lands in the strict-mode `required`
   list, and (b) the `anyOf`+`null` representation survives normalisation.
   Locks the documented strict-mode escape hatch into CI.

3. fix(adapters): replaces the `"<unknown>"` placeholder used in
   parse-error tool-result payloads with `"(unresolved)"` in both
   `openai.py` and `anthropic.py`. Angle-bracket sentinels read as HTML
   or magic placeholders to some LLMs; the new label is plain text.

The audit-flagged module-size delta (`_base.py` 459, `openai.py` 355) and
three minor nits were deferred per audit response choices (recommended
defaults). 367 tests pass, 96% total coverage. The two existing tests
that asserted error-result behaviour don't check the `capability_id`
field directly, so no test updates were needed for the placeholder swap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dgenio dgenio merged commit 68f5691 into main May 14, 2026
4 checks passed
@dgenio dgenio deleted the feat/llm-adapter-middleware branch May 14, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI tool-format adapter & middleware

2 participants