feat(examples,adapters,docs): code-review bot + voice agent architectures + FastMCP CodeMode hooks (#87, #204, #205) by dgenio · Pull Request #233 · dgenio/contextweaver

dgenio · 2026-05-16T18:31:21Z

Lands the 3-issue "adoption demonstrations" group selected by the triage
pass. Single combined PR (Mode B, owner-authorised). Same blast radius:
two new reference architectures under examples/architectures/ + two new
adapter callable factories in adapters/fastmcp.py + paired docs and tests.
Zero changes to the context or routing core pipelines.

#87 — FastMCP CodeMode discovery / context hooks

adapters/fastmcp.py grows two factories that return plain
Callable[[str], list[dict]] and Callable[[str, str], str]
respectively, matching the FastMCP CodeMode hook contract
(Coordination: CodeMode vs external context/routing strategies (contextweaver) PrefectHQ/fastmcp#3365) but framework-
agnostic — neither captures any FastMCP reference at runtime.
make_discovery_tool(router, catalog, *, top_k=None) wraps Router +
Catalog into a "given a query, return a shortlist of tools" callable;
graph-only nodes are skipped silently.
make_context_hook(context_manager, *, firewall_threshold=2000) wraps
the firewall as a "(query, raw_result) → summary" callable; raw bytes
park in the artifact store, query is recorded as item.metadata for
trace correlation.
examples/fastmcp_discovery_demo.py demos a 22-tool catalog → 3-tool
shortlist with ~86% token reduction.
12 unit tests under tests/test_adapters.py + 2 real-FastMCP integration
tests in tests/test_adapters_fastmcp_discovery.py (spins up
fastmcp.FastMCP in-memory server, round-trips through the hook).
fastmcp>=2.0 moved from [fastmcp] runtime extra into [dev] so the
integration test runs on every CI matrix cell.

#204 — Code-review bot reference architecture

examples/architectures/code_review_bot/ with main.py + catalog.yaml
(24 tools across grep/git/lint/typecheck/test/review) + README.md +
OUTPUT.md + init.py.
Six-step PR review walking a regression in payments/charge.py. The
firewall is the load-bearing pattern: synthetic ~28 KB diff dump and
~2.5 KB grep result both compact to ~500-char summaries while raw
bytes stay addressable.
10 smoke tests in tests/test_architectures_code_review.py pin
deterministic invariants (catalog size, intent matches, firewall
fires=2/6, artifact count=6, fact keys).
docs/architectures/code_review_bot.md is the public docs page.

#205 — Voice agent reference architecture (Pipecat)

examples/architectures/voice_agent/ with main.py + catalog.yaml
(18 tools across support/orders/shipping/account/callback) +
README.md + OUTPUT.md + init.py.
Canonical worked example for docs/integration_pipecat.md: every
context build runs via asyncio.to_thread(mgr.build_sync, ...);
ContextBudget(route=200, call=500, interpret=400, answer=1000)
enforces sub-300 ms TTS-friendly answer prompts (max 200 tokens at
five turns).
Pipecat optional: example runs end-to-end without pipecat-ai.
pyproject.toml grows a [voice] extra (pipecat-ai>=0.0.50) for users
who want the real FrameProcessor.
10 smoke tests in tests/test_architectures_voice.py pin the
async-build marker, intent matches, fact keys, and the 400-token
answer-prompt ceiling. Wall-clock timings are not asserted on.
docs/architectures/voice_agent.md is the public docs page.
docs/integration_pipecat.md gains a "Canonical worked example"
callout pointing back at the architecture.

Shared bookkeeping

Makefile architectures target runs all three architectures.
mkdocs.yml + docs/architectures/index.md link the two new pages.
CHANGELOG.md gains three bullets under [Unreleased].
tests/test_architectures_slack.py refactored from sys.path injection
to importlib.util.spec_from_file_location so the three architecture
test files can coexist in one pytest run (a bare import main from
sys.path collides across architectures).

Module-size note
adapters/fastmcp.py lands at 428 lines, over the soft 300-line guide,
in line with adapters/mcp.py (401) and adapters/proxy_runtime.py (462)
precedent. Mode B authorised the modest overrun rather than splitting
one adapter across two files.

Verification
ruff format --check src/ tests/ examples/ scripts/ → 145 files clean
ruff check src/ tests/ examples/ scripts/ → All checks passed
mypy src/ → 0 issues / 64 files
pytest --cov=contextweaver -q → 985 passed, 5 skipped
(+34 new tests)
make example → all 14 scripts ran
make demo → clean
make scorecard-check → clean (no benchmark drift)
make llms-check → up to date

Closes #87
Closes #204
Closes #205

https://claude.ai/code/session_01JiR8ZGtuwn7Cv2ahHMwLhL

…ures + FastMCP CodeMode hooks (#87, #204, #205) Lands the 3-issue "adoption demonstrations" group selected by the triage pass. Single combined PR (Mode B, owner-authorised). Same blast radius: two new reference architectures under examples/architectures/ + two new adapter callable factories in adapters/fastmcp.py + paired docs and tests. Zero changes to the context or routing core pipelines. #87 — FastMCP CodeMode discovery / context hooks - adapters/fastmcp.py grows two factories that return plain Callable[[str], list[dict]] and Callable[[str, str], str] respectively, matching the FastMCP CodeMode hook contract (PrefectHQ/fastmcp#3365) but framework- agnostic — neither captures any FastMCP reference at runtime. - make_discovery_tool(router, catalog, *, top_k=None) wraps Router + Catalog into a "given a query, return a shortlist of tools" callable; graph-only nodes are skipped silently. - make_context_hook(context_manager, *, firewall_threshold=2000) wraps the firewall as a "(query, raw_result) → summary" callable; raw bytes park in the artifact store, query is recorded as item.metadata for trace correlation. - examples/fastmcp_discovery_demo.py demos a 22-tool catalog → 3-tool shortlist with ~86% token reduction. - 12 unit tests under tests/test_adapters.py + 2 real-FastMCP integration tests in tests/test_adapters_fastmcp_discovery.py (spins up fastmcp.FastMCP in-memory server, round-trips through the hook). - fastmcp>=2.0 moved from [fastmcp] runtime extra into [dev] so the integration test runs on every CI matrix cell. #204 — Code-review bot reference architecture - examples/architectures/code_review_bot/ with main.py + catalog.yaml (24 tools across grep/git/lint/typecheck/test/review) + README.md + OUTPUT.md + __init__.py. - Six-step PR review walking a regression in payments/charge.py. The firewall is the load-bearing pattern: synthetic ~28 KB diff dump and ~2.5 KB grep result both compact to ~500-char summaries while raw bytes stay addressable. - 10 smoke tests in tests/test_architectures_code_review.py pin deterministic invariants (catalog size, intent matches, firewall fires=2/6, artifact count=6, fact keys). - docs/architectures/code_review_bot.md is the public docs page. #205 — Voice agent reference architecture (Pipecat) - examples/architectures/voice_agent/ with main.py + catalog.yaml (18 tools across support/orders/shipping/account/callback) + README.md + OUTPUT.md + __init__.py. - Canonical worked example for docs/integration_pipecat.md: every context build runs via asyncio.to_thread(mgr.build_sync, ...); ContextBudget(route=200, call=500, interpret=400, answer=1000) enforces sub-300 ms TTS-friendly answer prompts (max 200 tokens at five turns). - Pipecat optional: example runs end-to-end without pipecat-ai. pyproject.toml grows a [voice] extra (pipecat-ai>=0.0.50) for users who want the real FrameProcessor. - 10 smoke tests in tests/test_architectures_voice.py pin the async-build marker, intent matches, fact keys, and the 400-token answer-prompt ceiling. Wall-clock timings are not asserted on. - docs/architectures/voice_agent.md is the public docs page. - docs/integration_pipecat.md gains a "Canonical worked example" callout pointing back at the architecture. Shared bookkeeping - Makefile architectures target runs all three architectures. - mkdocs.yml + docs/architectures/index.md link the two new pages. - CHANGELOG.md gains three bullets under [Unreleased]. - tests/test_architectures_slack.py refactored from sys.path injection to importlib.util.spec_from_file_location so the three architecture test files can coexist in one pytest run (a bare ``import main`` from sys.path collides across architectures). Module-size note adapters/fastmcp.py lands at 428 lines, over the soft 300-line guide, in line with adapters/mcp.py (401) and adapters/proxy_runtime.py (462) precedent. Mode B authorised the modest overrun rather than splitting one adapter across two files. Verification ruff format --check src/ tests/ examples/ scripts/ → 145 files clean ruff check src/ tests/ examples/ scripts/ → All checks passed mypy src/ → 0 issues / 64 files pytest --cov=contextweaver -q → 985 passed, 5 skipped (+34 new tests) make example → all 14 scripts ran make demo → clean make scorecard-check → clean (no benchmark drift) make llms-check → up to date Closes #87 Closes #204 Closes #205 https://claude.ai/code/session_01JiR8ZGtuwn7Cv2ahHMwLhL

github-actions · 2026-05-16T18:35:05Z

Benchmark delta (vs `main`)

Soft regression feedback only — this comment never blocks the PR.
Latency budget: ⚠️ when head > base × 1.3. Accuracy budget: ⚠️ when head < base - 1pp.

Routing summary (single backend × catalog sizes)

size	recall@k (head Δ vs base)	MRR (head Δ vs base)	p99 (ms)
50	✅ 0.5649 (+0.0000)	✅ 0.4978 (+0.0000)	✅ 0.570 (base 0.463)
83	✅ 0.3825 (+0.0000)	✅ 0.3242 (+0.0000)	✅ 0.693 (base 0.876)
1000	✅ 0.1475 (+0.0000)	✅ 0.1456 (+0.0000)	✅ 36.009 (base 31.897)

Per-backend × per-size matrix

backend	size	recall@k (Δ)	MRR (Δ)	p99 (ms)
bm25	100	✅ 0.3825 (+0.0000)	✅ 0.3399 (+0.0000)	✅ 5.875 (base 5.642)
bm25	500	✅ 0.2250 (+0.0000)	✅ 0.2165 (+0.0000)	✅ 31.481 (base 27.538)
bm25	1000	✅ 0.1575 (+0.0000)	✅ 0.1525 (+0.0000)	✅ 85.894 (base 78.368)
fuzzy	100	✅ 0.0000 (+0.0000)	✅ 0.0000 (+0.0000)	✅ 0.000 (base 0.000)
fuzzy	500	✅ 0.0000 (+0.0000)	✅ 0.0000 (+0.0000)	✅ 0.000 (base 0.000)
fuzzy	1000	✅ 0.0000 (+0.0000)	✅ 0.0000 (+0.0000)	✅ 0.000 (base 0.000)
tfidf	100	✅ 0.3825 (+0.0000)	✅ 0.3220 (+0.0000)	✅ 0.995 (base 0.872)
tfidf	500	✅ 0.2325 (+0.0000)	✅ 0.2314 (+0.0000)	✅ 9.136 (base 8.660)
tfidf	1000	✅ 0.1475 (+0.0000)	✅ 0.1456 (+0.0000)	✅ 35.933 (base 30.071)

Context pipeline (per scenario)

scenario	tokens	dropped	dedup
large_catalog	1514 (base 1514, Δ+0)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
long_conversation	2548 (base 2548, Δ+0)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
short_conversation	496 (base 496, Δ+0)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
stress_conversation	6651 (base 6651, Δ+0)	7 (base 7, Δ+0)	4 (base 4, Δ+0)

Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.

Copilot

Pull request overview

Adds three “adoption demonstration” deliverables to the repo: (1) FastMCP CodeMode hook factories in the FastMCP adapter, and (2) two new runnable reference architectures (code-review bot + voice agent) with smoke tests and public docs, all wired into the examples/architectures and docs navigation.

Changes:

Add make_discovery_tool() and make_context_hook() factories to contextweaver.adapters.fastmcp and re-export them from contextweaver.adapters.
Introduce two new reference architectures under examples/architectures/ (code_review_bot, voice_agent) with deterministic outputs, tests, and docs pages.
Wire the new examples/docs into Makefile, mkdocs.yml, docs/architectures/index.md, pyproject.toml extras, and CHANGELOG.md.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/test_architectures_voice.py	Adds smoke tests for the new voice agent architecture by running the script and asserting deterministic invariants.
tests/test_architectures_slack.py	Refactors Slack architecture smoke test import to use `importlib.util.spec_from_file_location` to avoid module-name collisions.
tests/test_architectures_code_review.py	Adds smoke tests for the new code-review bot architecture by running the script and asserting deterministic invariants.
tests/test_adapters.py	Adds unit tests for FastMCP CodeMode hook factories (`make_discovery_tool`, `make_context_hook`).
tests/test_adapters_fastmcp_discovery.py	Adds real FastMCP integration coverage for the discovery hook against an in-memory FastMCP server.
src/contextweaver/adapters/fastmcp.py	Adds CodeMode-style hook factories for tool discovery + firewall-based context compaction.
src/contextweaver/adapters/init.py	Re-exports the new FastMCP hook factories from the adapters package.
pyproject.toml	Adds `fastmcp` to `[dev]` and introduces a new `[voice]` optional extra for `pipecat-ai`.
mkdocs.yml	Adds the new architectures pages to the docs nav.
Makefile	Adds the FastMCP demo to `make example` and the two new architectures to `make architectures`.
examples/fastmcp_discovery_demo.py	Adds a standalone demo showing catalog → shortlist compression and firewall compaction via the new hook callables.
examples/architectures/voice_agent/README.md	Documents the voice agent architecture, its budgets, and the async build pattern (`asyncio.to_thread`).
examples/architectures/voice_agent/OUTPUT.md	Captured deterministic output for the voice agent architecture.
examples/architectures/voice_agent/main.py	Implements the voice agent reference architecture (mock tools, async builds, tight budgets, fact persistence).
examples/architectures/voice_agent/catalog.yaml	Provides the 18-tool catalog used by the voice agent architecture.
examples/architectures/voice_agent/init.py	Adds a minimal package marker for the voice agent architecture directory.
examples/architectures/code_review_bot/README.md	Documents the code-review bot architecture emphasizing firewall + artifact store behavior.
examples/architectures/code_review_bot/OUTPUT.md	Captured deterministic output for the code-review bot architecture.
examples/architectures/code_review_bot/main.py	Implements the code-review bot reference architecture (mock tools, firewall-heavy steps, fact persistence).
examples/architectures/code_review_bot/catalog.yaml	Provides the 24-tool catalog used by the code-review bot architecture.
examples/architectures/code_review_bot/init.py	Adds a minimal package marker for the code-review bot architecture directory.
docs/integration_pipecat.md	Adds a callout pointing readers to the voice agent as the canonical worked example.
docs/architectures/voice_agent.md	Adds the public docs page for the voice agent architecture.
docs/architectures/index.md	Updates the architectures index to list Slack ops bot, code-review bot, and voice agent.
docs/architectures/code_review_bot.md	Adds the public docs page for the code-review bot architecture.
CHANGELOG.md	Documents the additions (FastMCP hooks + two architectures) under Unreleased.

Comments suppressed due to low confidence (2)

src/contextweaver/adapters/fastmcp.py:410

The factories raise bare ValueError for invalid firewall_threshold, but AGENTS.md requires using contextweaver.exceptions for errors (no bare ValueError/RuntimeError). Consider raising ConfigError (or another ContextWeaverError subclass) instead and updating the corresponding tests.

    if firewall_threshold < 0:
        raise ValueError(f"firewall_threshold must be >= 0, got {firewall_threshold}")

tests/test_adapters.py:1347

These tests assert ValueError for invalid firewall_threshold, but AGENTS.md establishes a repo convention to raise ContextWeaverError subclasses (no bare ValueError/RuntimeError). Once make_context_hook switches to ConfigError (or similar), update this expectation accordingly.

def test_context_hook_negative_threshold_raises() -> None:
    """Negative thresholds are rejected at factory time."""
    mgr = ContextManager()
    with pytest.raises(ValueError, match="firewall_threshold must be >= 0"):
        make_context_hook(mgr, firewall_threshold=-1)

+    if top_k is not None and top_k < 0:
+        raise ValueError(f"top_k must be >= 0, got {top_k}")
+


+    def discover(query: str) -> list[dict[str, Any]]:
+        result = router.route(query)
+        ids = result.candidate_ids
+        if top_k is not None:
+            ids = ids[:top_k]
+        out: list[dict[str, Any]] = []
+        for tid in ids:
+            try:
+                hydrated = catalog.hydrate(tid)
+            except (ItemNotFoundError, CatalogError):
+                # Candidate id not in catalog (graph-only node, e.g. category
+                # label).  Skip rather than fail — the discovery hook must
+                # never reject a valid query just because one node is
+                # virtual.
+                continue
+            out.append(


+                {
+                    "name": hydrated.item.name,
+                    "description": hydrated.item.description,
+                    "input_schema": dict(hydrated.args_schema),
+                }
+            )


+        A pure callable ``(query, raw_result) -> str``.  *query* is recorded
+        as the parent user-turn id for dependency closure; *raw_result* is
+        the verbatim tool output.  Returns the firewall summary (or the raw
+        result if below threshold).


+# Module-level importorskip: when the optional ``fastmcp`` extra is not
+# installed (matrix runners that only pull ``[dev]``), skip this entire
+# file rather than fail at collection.  The unit tests in test_adapters.py
+# still run, so coverage of the adapter does not regress.


+def test_discovery_tool_negative_top_k_raises() -> None:
+    """Negative ceilings are rejected at factory time, not at call time."""
+    catalog = _build_test_catalog()
+    router = _build_test_router(catalog)
+    with pytest.raises(ValueError, match="top_k must be >= 0"):
+        make_discovery_tool(router, catalog, top_k=-1)
+


Copilot AI review requested due to automatic review settings May 16, 2026 18:31

Copilot started reviewing on behalf of dgenio May 16, 2026 18:31 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(examples,adapters,docs): code-review bot + voice agent architectures + FastMCP CodeMode hooks (#87, #204, #205)#233

feat(examples,adapters,docs): code-review bot + voice agent architectures + FastMCP CodeMode hooks (#87, #204, #205)#233
dgenio wants to merge 1 commit into
mainfrom
claude/triage-issues-uF9JN

dgenio commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if top_k is not None and top_k < 0:
		raise ValueError(f"top_k must be >= 0, got {top_k}")

Conversation

dgenio commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Benchmark delta (vs main)

Routing summary (single backend × catalog sizes)

Per-backend × per-size matrix

Context pipeline (per scenario)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Benchmark delta (vs `main`)