feat: OpenAI and Anthropic tool-format adapters with middleware (#55, #50, #40)#69
Merged
Conversation
…50, #40) Adds `agent_kernel.adapters` with two drop-in middleware classes that translate Capability objects into vendor tool schemas, route tool calls through the full kernel pipeline (grant → invoke → firewall → trace), and return vendor-shaped tool-result objects. Both share a `BaseToolMiddleware` that owns hook registration, error-as-result conversion, and the canonical Frame → JSON payload shape. OpenAIMiddleware emits Responses-API tools by default (also supports Chat Completions via `format=chat_completions`), with dotted capability IDs mapped to `namespace__function` form and OpenAI `strict` mode opt-in via `Capability.tool_hints`. AnthropicMiddleware emits Anthropic Messages tools with optional `cache_control` (per-capability or middleware default) and preserves dotted capability IDs. Both auto-detect Chat/Responses shape on input regardless of configured output format. Capability gains three optional fields: `parameters_model` (pydantic model used for JSON-Schema generation and input validation), `parameters_schema` (raw JSON Schema escape hatch), and `tool_hints` (ToolHints — vendor flags). All default to None, preserving backward compat. Kernel gains a small `list_capabilities()` accessor. Adds `pydantic>=2` as a runtime dep (justified by the new adapters; only used inside the adapters package). No `openai` / `anthropic` SDK dependency — every adapter function is a pure dict transform. PolicyDenied, CapabilityNotFound, DriverError, argument-validation failures, and hook abort signals all surface as tool-result errors rather than raised exceptions so the LLM can react. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a new agent_kernel.adapters package providing OpenAI and Anthropic “tool-format” adapters plus middleware that routes vendor tool calls through the kernel’s full pipeline (grant → invoke → firewall → trace), with schema generation/validation support via Pydantic.
Changes:
- Added OpenAI + Anthropic adapter modules and a shared
BaseToolMiddleware(hooks, dispatch, vendor-shape formatting, schema helpers). - Extended
Capabilitywith optionalparameters_model,parameters_schema, andtool_hints(ToolHints) to drive tool schemas and optional strict/cache settings. - Added
Kernel.list_capabilities()and updated docs/tests/changelog and runtime deps (pydantic>=2).
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_adapters.py | New test suite covering schema conversion, middleware flow, hooks, aborts, and error-as-result behavior. |
| src/agent_kernel/models.py | Adds ToolHints and new optional Capability fields for adapter schema/validation/hints. |
| src/agent_kernel/kernel.py | Adds Kernel.list_capabilities() to enumerate registered capabilities. |
| src/agent_kernel/adapters/_base.py | New shared middleware base, hook/event types, schema helpers, payload helpers, namespace helpers. |
| src/agent_kernel/adapters/openai.py | New OpenAI tool schema conversion + middleware supporting Responses + Chat Completions formats. |
| src/agent_kernel/adapters/anthropic.py | New Anthropic tool schema conversion + middleware with optional cache_control. |
| src/agent_kernel/adapters/init.py | Public exports for adapter layer. |
| src/agent_kernel/init.py | Re-exports middlewares and ToolHints at top level. |
| pyproject.toml | Adds runtime dependency on pydantic>=2. |
| docs/integrations.md | Adds “LLM tool-format adapters” documentation and usage examples. |
| docs/architecture.md | Documents adapters as an architecture component. |
| AGENTS.md | Updates minimal dependency list to include pydantic. |
| CHANGELOG.md | Adds [Unreleased] entries describing the new adapter feature set and dependency change. |
Comments suppressed due to low confidence (2)
src/agent_kernel/adapters/openai.py:197
- Same as above:
_parse_argumentsraisesValueErrorfor invalid argument types/JSON. For consistency with the repo’s error-contract rule inAGENTS.md, map these parse failures to a customAgentKernelErrorsubclass so callers can reliably catch agent-kernel errors (and so exception types are part of the contract).
if not isinstance(raw, str):
raise ValueError(
f"OpenAI tool_call 'arguments' must be a JSON string or dict, got {type(raw).__name__}."
)
src/agent_kernel/adapters/anthropic.py:128
- Same issue here: raising
ValueErrorfor non-dictinputviolates the repo’s “no bare ValueError to callers” rule. If you add a custom adapter parse/validation exception, use it consistently for all adapter-facing shape errors.
if raw_input is None:
raw_input = {}
if not isinstance(raw_input, dict):
raise ValueError(
f"Anthropic tool_use 'input' must be an object (got {type(raw_input).__name__})."
)
…e namespace collisions Addresses Copilot review feedback on PR #69: 1. Adds AdapterParseError(AgentKernelError) in errors.py. The OpenAI and Anthropic adapter parse helpers (tool_call_to_request, tool_use_to_request, _extract_name_and_call_id, _parse_arguments) previously raised bare ValueError on malformed input, violating AGENTS.md's "no bare ValueError/KeyError to callers" rule. All 6 raise sites now raise AdapterParseError; the two handle_tool_calls / handle_tool_uses dispatch loops catch the new exception type and convert it to a tool-result error as before. 2. make_namespace_safe_name now rejects capability IDs containing the reserved "__" separator at adapter-emit time. Previously "a__b" and "a.b" would both map to OpenAI tool name "a__b", a silent collision; the new AdapterParseError surfaces the issue with a clear remediation message. capabilities_to_tools and OpenAIMiddleware.get_tools() propagate the error. 3. Fixes a docstring contradiction in OpenAIMiddleware.handle_tool_calls: the Args section claimed non-function items were "passed through unchanged", but the Returns section and the code both said/did "skip". Docstring now consistently reflects the skip behavior, with an explanation of why (caller stitches results back into the conversation alongside the original items). Test changes: - Updated 6 pytest.raises(ValueError, ...) sites to AdapterParseError. - Added 2 new tests covering the namespace collision rejection path. - Total: 366 tests pass, 96% coverage (was 364, 96%). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er adapter error placeholder Addresses three findings from the audit pass on PR #69: 1. docs(integrations): adds a "Strict mode caveats" subsection to the OpenAI strict-mode docs. Explains that the normaliser forces every property into `required` (per OpenAI's contract) and that pydantic fields with non-`None` defaults are not exempt. Shows the `Optional[T] = None` pattern as the escape hatch for truly-optional fields under strict mode — pydantic emits `anyOf`+`null` which OpenAI strict accepts. 2. test(adapters): adds `test_openai_strict_with_optional_field_preserves_nullable` asserting (a) the Optional field lands in the strict-mode `required` list, and (b) the `anyOf`+`null` representation survives normalisation. Locks the documented strict-mode escape hatch into CI. 3. fix(adapters): replaces the `"<unknown>"` placeholder used in parse-error tool-result payloads with `"(unresolved)"` in both `openai.py` and `anthropic.py`. Angle-bracket sentinels read as HTML or magic placeholders to some LLMs; the new label is plain text. The audit-flagged module-size delta (`_base.py` 459, `openai.py` 355) and three minor nits were deferred per audit response choices (recommended defaults). 367 tests pass, 96% total coverage. The two existing tests that asserted error-result behaviour don't check the `capability_id` field directly, so no test updates were needed for the placeholder swap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #55, #50, #40.
Adds a vendor-agnostic LLM tool-format adapter layer to the kernel so callers can hand
Capabilityobjects to OpenAI or Anthropic clients without writing schema-translation glue. The middleware classes also route the vendor's tool-call objects back through the full kernel pipeline (grant → invoke → firewall → trace), preserving every weaver-spec invariant (I-01 firewall mediation, I-02 audited authorization, I-06 per-principal tokens).The kernel previously required users to hand-build JSON Schema by hand from
allowed_fields(which is an output-redaction control, not an input schema), hand-translate betweenCapabilityand each vendor's tool shape, and stitch the call/result loop throughgrant_capability/invokethemselves. The Weaver-spec ecosystem markets the kernel as a security layer between LLMs and tools — but without a drop-in pipeline integration, every consumer had to write the same ~200 lines of boilerplate, and the most common pattern (allowed_fields→ tool schema) silently advertised wrong information to the LLM.What changed
src/agent_kernel/adapters/__init__.pyOpenAIMiddleware,AnthropicMiddleware,BaseToolMiddleware, event types.src/agent_kernel/adapters/_base.pyBaseToolMiddleware(hook registration + dispatch, request/grant/invoke flow, error-as-result conversion),ToolCallEvent/ToolResultEvent/PreparedCalldataclasses, pydantic-driven schema generation (build_input_schema,normalize_for_openai_strict,validate_input), namespace helpers (make_namespace_safe_name,restore_namespace), canonicalframe_to_payload/error_to_payload.src/agent_kernel/adapters/openai.pyOpenAIMiddlewareplus the publiccapabilities_to_tools,tool_call_to_request,format_resulthelpers. Supports both Responses API (default) and Chat Completions; auto-detects input shape per call regardless of configured output format. Dotted capability IDs ↔namespace__functionform with explicit collision rejection.src/agent_kernel/adapters/anthropic.pyAnthropicMiddlewareplus matching helpers. Per-capability and middleware-defaultcache_controlsupport. Preserves dotted capability IDs (Anthropic accepts.in tool names).src/agent_kernel/models.pyToolHintsdataclass (cache_control,strict) and three optional fields onCapability:parameters_model: type[pydantic.BaseModel] | None,parameters_schema: dict | None,tool_hints: ToolHints | None. All default toNone; no existing call site needs to change.src/agent_kernel/kernel.pyKernel.list_capabilities()accessor — used by the adapters; generally useful for tooling that needs to enumerate the registry without keyword search.src/agent_kernel/errors.pyAdapterParseError(AgentKernelError)raised by adapter parse and validation helpers (replaces bareValueErrorperAGENTS.md). Also covers capability-ID validation (e.g. IDs containing the reserved__namespace separator).src/agent_kernel/__init__.pyToolHints, andAdapterParseError.tests/test_adapters.pyPolicyDenied,CapabilityNotFound,DriverError,AdapterParseError, pydanticValidationError), namespace collision rejection, OpenAI strict mode with both default-bearing andOptional[T]fields.pyproject.tomlpydantic>=2runtime dependency. Justification: schema generation, argument validation, and consistent JSON Schema emission across both vendors. Imported only by the adapters package — kernel code outsideadapters/does not load pydantic.docs/integrations.mdOptional[T]escape hatch),cache_control, hooks, error-as-result.docs/architecture.mdAGENTS.mdhttpx,pydantic.CHANGELOG.md[Unreleased]entries under Added + Changed.Design decisions
parameters_model(pydantic), withparameters_schemaas the raw-dict escape hatch.allowed_fieldsstays as the output-redaction control the firewall already consumes — using it for input schema was the previous foot-gun this PR explicitly removes. Capabilities without either model or schema fall back to a permissive{"type": "object", "additionalProperties": true}so existing capabilities keep working.AdapterParseError(AgentKernelError), not bareValueError. All six raise sites in the adapter parse helpers (_extract_name_and_call_id,_parse_arguments,tool_use_to_request,make_namespace_safe_name) use the new class. The twohandle_tool_calls/handle_tool_usesdispatch loops catchAdapterParseErrorand convert it into a tool-result error —is_error: truefor Anthropic,error: truepayload for OpenAI — so the surrounding agent loop never crashes on malformed input.__is rejected at adapter-emit time. The OpenAI tool name field accepts__but doesn't accept., so capability IDs are mapped with__as the separator. A capability ID that contains__would collide ambiguously (a__banda.bwould both map toa__b), somake_namespace_safe_namerejects them with a clearAdapterParseErrorrather than emitting a colliding tool definition.function_call_outputenvelopes, flat tool definitions); opt-in to Chat Completions viaformat="chat_completions". Input detection works regardless of output format, sohandle_tool_callsaccepts either shape.intercept_tool_callruns before kernel invocation and may mutateevent.args, injectevent.justification(the per-call justification path WRITE/DESTRUCTIVE capabilities need), or setevent.aborted = True(the approval-gate path).intercept_tool_resultruns after the kernel returns and may replaceevent.frame. Pre-hook exceptions become tool-result errors (the surrounding loop survives); post-hook exceptions are logged at WARNING and the batch continues.grant_capabilityand never reuses it. Reusing tokens across calls would invite I-06 violations when middleware instances are accidentally shared.openai/anthropicSDK runtime dependency. Both vendors accept plain dicts; the adapter is pure dict-in / dict-out. Adding the SDKs would buy IDE autocomplete that callers can get themselves by importing those SDKs at the call site.ToolHints(strict=True). The adapter normalises the pydantic-emitted schema (forces every property required, setsadditionalProperties: falserecursively); a documented caveat indocs/integrations.mdexplains theOptional[T] = Noneescape hatch for truly-optional fields (which pydantic emits withanyOf+null, accepted by OpenAI strict).handle_tool_calls(..., justification=""); per-call override viaargs["_justification"](popped before the kernel sees the args); hook-injected viaevent.justification. READ-only batches get away with no justification; WRITE/DESTRUCTIVE batches can supply one through whichever layer fits the agent harness.Scope
In scope (delivered):
cache_control(per-capability + middleware-default)BaseToolMiddlewarewith hooks (sync or async, abort, mutable event)AdapterParseErrorfor all adapter parse / validation failuresKernel.list_capabilities()accessorOut of scope (deferred or rejected by design):
openai/anthropicSDK optional extras — both vendors accept dicts; not worth the version-pinning maintenance.Capabilityto be apydantic.BaseModel— too invasive for this PR; would touch dozens of files._base.pyinto_base.py+_helpers.pyto fit AGENTS.md's 300-line guideline —_base.pyis 459 lines,openai.pyis 355. Existing repo has three modules over budget (per [policy/kernel] Tech debt: decompose policy_dsl.py and broaden dry-run driver test coverage #68); happy to factor this out on request.Testing
The new tests are organised into:
None);ToolHintsdefaults.build_input_schemaresolution order, deep-copy isolation,normalize_for_openai_strictrecursive enforcement,validate_inputpass-through vs. coercion vs. rejection.__collision rejection (test_namespace_rejects_capability_id_with_reserved_separator,test_namespace_collision_surfaces_via_capabilities_to_tools).Optional[T]escape hatch viatest_openai_strict_with_optional_field_preserves_nullable),tool_call_to_requestfor both Chat Completions and Responses, full middleware flow, error paths (PolicyDenied, unknown capability, invalid JSON), hook ordering (sync + async + abort + justification injection), per-call override.cache_controlprecedence,tool_use_to_request, full middleware flow, error paths, hook ordering, parse-error →is_errorblock.Risks
normalize_for_openai_stricthandles the two reliable gotchas (additionalProperties,required); if pydantic adds a feature OpenAI strict eventually rejects, normalisation falls back to non-strict with awarnings.warn(...)(currently uncovered defensively — no realistic input triggers it).DriverError(after the kernel's own driver fallback) becomes a tool-result error, not a retried call. WRITE/DESTRUCTIVE callers should already be idempotent at the driver layer.OpenAIMiddleware(kernel, principal)constructor shape. Sharing a middleware across concurrent batches while mutating its hooks would have undefined ordering.Documentation
docs/integrations.md— new "LLM tool-format adapters" section with end-to-end OpenAI and Anthropic examples, namespace mapping table (and the collision rejection rule), strict mode (and theOptional[T] = Noneescape hatch caveat),cache_control, hook usage, error-as-tool-result contract.docs/architecture.md— adapters listed as an architecture component, with a pointer todocs/integrations.mdfor usage.AGENTS.md— dep list updated tohttpx, pydantic.CHANGELOG.md—[Unreleased]entries under Added (adapters,Capabilityfields,ToolHints,Kernel.list_capabilities(),AdapterParseError) and Changed (pydantic runtime dep).AI agent instruction files reviewed
AGENTS.md— dep list updated; no convention changes.docs/agent-context/invariants.md— adapters consumeFramepost-firewall and route every call throughkernel.invoke(), so I-01, I-02, I-06 remain enforced by existing code paths. No change needed.docs/agent-context/review-checklist.md,lessons-learned.md,workflows.md— no change needed..github/copilot-instructions.md,.claude/CLAUDE.md— no change needed.Checklist
make cipasses locally (fmt → lint → mypy strict → pytest → examples)capability,principal,grant,FramethroughoutCapabilityfields default toNone; no existing test required updatesValueError/KeyErrorto callers (perAGENTS.md) — adapter parse errors useAdapterParseErrorCHANGELOG.mdupdated under[Unreleased]docs/architecture.md,docs/integrations.md,AGENTS.md) in the same PR🤖 Generated with Claude Code