feat(examples,adapters,docs): code-review bot + voice agent architectures + FastMCP CodeMode hooks (#87, #204, #205)#233
feat(examples,adapters,docs): code-review bot + voice agent architectures + FastMCP CodeMode hooks (#87, #204, #205)#233dgenio wants to merge 1 commit into
Conversation
…ures + FastMCP CodeMode hooks (#87, #204, #205) Lands the 3-issue "adoption demonstrations" group selected by the triage pass. Single combined PR (Mode B, owner-authorised). Same blast radius: two new reference architectures under examples/architectures/ + two new adapter callable factories in adapters/fastmcp.py + paired docs and tests. Zero changes to the context or routing core pipelines. #87 — FastMCP CodeMode discovery / context hooks - adapters/fastmcp.py grows two factories that return plain Callable[[str], list[dict]] and Callable[[str, str], str] respectively, matching the FastMCP CodeMode hook contract (PrefectHQ/fastmcp#3365) but framework- agnostic — neither captures any FastMCP reference at runtime. - make_discovery_tool(router, catalog, *, top_k=None) wraps Router + Catalog into a "given a query, return a shortlist of tools" callable; graph-only nodes are skipped silently. - make_context_hook(context_manager, *, firewall_threshold=2000) wraps the firewall as a "(query, raw_result) → summary" callable; raw bytes park in the artifact store, query is recorded as item.metadata for trace correlation. - examples/fastmcp_discovery_demo.py demos a 22-tool catalog → 3-tool shortlist with ~86% token reduction. - 12 unit tests under tests/test_adapters.py + 2 real-FastMCP integration tests in tests/test_adapters_fastmcp_discovery.py (spins up fastmcp.FastMCP in-memory server, round-trips through the hook). - fastmcp>=2.0 moved from [fastmcp] runtime extra into [dev] so the integration test runs on every CI matrix cell. #204 — Code-review bot reference architecture - examples/architectures/code_review_bot/ with main.py + catalog.yaml (24 tools across grep/git/lint/typecheck/test/review) + README.md + OUTPUT.md + __init__.py. - Six-step PR review walking a regression in payments/charge.py. The firewall is the load-bearing pattern: synthetic ~28 KB diff dump and ~2.5 KB grep result both compact to ~500-char summaries while raw bytes stay addressable. - 10 smoke tests in tests/test_architectures_code_review.py pin deterministic invariants (catalog size, intent matches, firewall fires=2/6, artifact count=6, fact keys). - docs/architectures/code_review_bot.md is the public docs page. #205 — Voice agent reference architecture (Pipecat) - examples/architectures/voice_agent/ with main.py + catalog.yaml (18 tools across support/orders/shipping/account/callback) + README.md + OUTPUT.md + __init__.py. - Canonical worked example for docs/integration_pipecat.md: every context build runs via asyncio.to_thread(mgr.build_sync, ...); ContextBudget(route=200, call=500, interpret=400, answer=1000) enforces sub-300 ms TTS-friendly answer prompts (max 200 tokens at five turns). - Pipecat optional: example runs end-to-end without pipecat-ai. pyproject.toml grows a [voice] extra (pipecat-ai>=0.0.50) for users who want the real FrameProcessor. - 10 smoke tests in tests/test_architectures_voice.py pin the async-build marker, intent matches, fact keys, and the 400-token answer-prompt ceiling. Wall-clock timings are not asserted on. - docs/architectures/voice_agent.md is the public docs page. - docs/integration_pipecat.md gains a "Canonical worked example" callout pointing back at the architecture. Shared bookkeeping - Makefile architectures target runs all three architectures. - mkdocs.yml + docs/architectures/index.md link the two new pages. - CHANGELOG.md gains three bullets under [Unreleased]. - tests/test_architectures_slack.py refactored from sys.path injection to importlib.util.spec_from_file_location so the three architecture test files can coexist in one pytest run (a bare ``import main`` from sys.path collides across architectures). Module-size note adapters/fastmcp.py lands at 428 lines, over the soft 300-line guide, in line with adapters/mcp.py (401) and adapters/proxy_runtime.py (462) precedent. Mode B authorised the modest overrun rather than splitting one adapter across two files. Verification ruff format --check src/ tests/ examples/ scripts/ → 145 files clean ruff check src/ tests/ examples/ scripts/ → All checks passed mypy src/ → 0 issues / 64 files pytest --cov=contextweaver -q → 985 passed, 5 skipped (+34 new tests) make example → all 14 scripts ran make demo → clean make scorecard-check → clean (no benchmark drift) make llms-check → up to date Closes #87 Closes #204 Closes #205 https://claude.ai/code/session_01JiR8ZGtuwn7Cv2ahHMwLhL
Benchmark delta (vs
|
| size | recall@k (head Δ vs base) | MRR (head Δ vs base) | p99 (ms) |
|---|---|---|---|
| 50 | ✅ 0.5649 (+0.0000) | ✅ 0.4978 (+0.0000) | ✅ 0.570 (base 0.463) |
| 83 | ✅ 0.3825 (+0.0000) | ✅ 0.3242 (+0.0000) | ✅ 0.693 (base 0.876) |
| 1000 | ✅ 0.1475 (+0.0000) | ✅ 0.1456 (+0.0000) | ✅ 36.009 (base 31.897) |
Per-backend × per-size matrix
| backend | size | recall@k (Δ) | MRR (Δ) | p99 (ms) |
|---|---|---|---|---|
| bm25 | 100 | ✅ 0.3825 (+0.0000) | ✅ 0.3399 (+0.0000) | ✅ 5.875 (base 5.642) |
| bm25 | 500 | ✅ 0.2250 (+0.0000) | ✅ 0.2165 (+0.0000) | ✅ 31.481 (base 27.538) |
| bm25 | 1000 | ✅ 0.1575 (+0.0000) | ✅ 0.1525 (+0.0000) | ✅ 85.894 (base 78.368) |
| fuzzy | 100 | ✅ 0.0000 (+0.0000) | ✅ 0.0000 (+0.0000) | ✅ 0.000 (base 0.000) |
| fuzzy | 500 | ✅ 0.0000 (+0.0000) | ✅ 0.0000 (+0.0000) | ✅ 0.000 (base 0.000) |
| fuzzy | 1000 | ✅ 0.0000 (+0.0000) | ✅ 0.0000 (+0.0000) | ✅ 0.000 (base 0.000) |
| tfidf | 100 | ✅ 0.3825 (+0.0000) | ✅ 0.3220 (+0.0000) | ✅ 0.995 (base 0.872) |
| tfidf | 500 | ✅ 0.2325 (+0.0000) | ✅ 0.2314 (+0.0000) | ✅ 9.136 (base 8.660) |
| tfidf | 1000 | ✅ 0.1475 (+0.0000) | ✅ 0.1456 (+0.0000) | ✅ 35.933 (base 30.071) |
Context pipeline (per scenario)
| scenario | tokens | dropped | dedup |
|---|---|---|---|
| large_catalog | 1514 (base 1514, Δ+0) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| long_conversation | 2548 (base 2548, Δ+0) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| short_conversation | 496 (base 496, Δ+0) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| stress_conversation | 6651 (base 6651, Δ+0) | 7 (base 7, Δ+0) | 4 (base 4, Δ+0) |
Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.
There was a problem hiding this comment.
Pull request overview
Adds three “adoption demonstration” deliverables to the repo: (1) FastMCP CodeMode hook factories in the FastMCP adapter, and (2) two new runnable reference architectures (code-review bot + voice agent) with smoke tests and public docs, all wired into the examples/architectures and docs navigation.
Changes:
- Add
make_discovery_tool()andmake_context_hook()factories tocontextweaver.adapters.fastmcpand re-export them fromcontextweaver.adapters. - Introduce two new reference architectures under
examples/architectures/(code_review_bot, voice_agent) with deterministic outputs, tests, and docs pages. - Wire the new examples/docs into
Makefile,mkdocs.yml,docs/architectures/index.md,pyproject.tomlextras, andCHANGELOG.md.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_architectures_voice.py | Adds smoke tests for the new voice agent architecture by running the script and asserting deterministic invariants. |
| tests/test_architectures_slack.py | Refactors Slack architecture smoke test import to use importlib.util.spec_from_file_location to avoid module-name collisions. |
| tests/test_architectures_code_review.py | Adds smoke tests for the new code-review bot architecture by running the script and asserting deterministic invariants. |
| tests/test_adapters.py | Adds unit tests for FastMCP CodeMode hook factories (make_discovery_tool, make_context_hook). |
| tests/test_adapters_fastmcp_discovery.py | Adds real FastMCP integration coverage for the discovery hook against an in-memory FastMCP server. |
| src/contextweaver/adapters/fastmcp.py | Adds CodeMode-style hook factories for tool discovery + firewall-based context compaction. |
| src/contextweaver/adapters/init.py | Re-exports the new FastMCP hook factories from the adapters package. |
| pyproject.toml | Adds fastmcp to [dev] and introduces a new [voice] optional extra for pipecat-ai. |
| mkdocs.yml | Adds the new architectures pages to the docs nav. |
| Makefile | Adds the FastMCP demo to make example and the two new architectures to make architectures. |
| examples/fastmcp_discovery_demo.py | Adds a standalone demo showing catalog → shortlist compression and firewall compaction via the new hook callables. |
| examples/architectures/voice_agent/README.md | Documents the voice agent architecture, its budgets, and the async build pattern (asyncio.to_thread). |
| examples/architectures/voice_agent/OUTPUT.md | Captured deterministic output for the voice agent architecture. |
| examples/architectures/voice_agent/main.py | Implements the voice agent reference architecture (mock tools, async builds, tight budgets, fact persistence). |
| examples/architectures/voice_agent/catalog.yaml | Provides the 18-tool catalog used by the voice agent architecture. |
| examples/architectures/voice_agent/init.py | Adds a minimal package marker for the voice agent architecture directory. |
| examples/architectures/code_review_bot/README.md | Documents the code-review bot architecture emphasizing firewall + artifact store behavior. |
| examples/architectures/code_review_bot/OUTPUT.md | Captured deterministic output for the code-review bot architecture. |
| examples/architectures/code_review_bot/main.py | Implements the code-review bot reference architecture (mock tools, firewall-heavy steps, fact persistence). |
| examples/architectures/code_review_bot/catalog.yaml | Provides the 24-tool catalog used by the code-review bot architecture. |
| examples/architectures/code_review_bot/init.py | Adds a minimal package marker for the code-review bot architecture directory. |
| docs/integration_pipecat.md | Adds a callout pointing readers to the voice agent as the canonical worked example. |
| docs/architectures/voice_agent.md | Adds the public docs page for the voice agent architecture. |
| docs/architectures/index.md | Updates the architectures index to list Slack ops bot, code-review bot, and voice agent. |
| docs/architectures/code_review_bot.md | Adds the public docs page for the code-review bot architecture. |
| CHANGELOG.md | Documents the additions (FastMCP hooks + two architectures) under Unreleased. |
Comments suppressed due to low confidence (2)
src/contextweaver/adapters/fastmcp.py:410
- The factories raise bare ValueError for invalid firewall_threshold, but AGENTS.md requires using contextweaver.exceptions for errors (no bare ValueError/RuntimeError). Consider raising ConfigError (or another ContextWeaverError subclass) instead and updating the corresponding tests.
if firewall_threshold < 0:
raise ValueError(f"firewall_threshold must be >= 0, got {firewall_threshold}")
tests/test_adapters.py:1347
- These tests assert ValueError for invalid firewall_threshold, but AGENTS.md establishes a repo convention to raise ContextWeaverError subclasses (no bare ValueError/RuntimeError). Once make_context_hook switches to ConfigError (or similar), update this expectation accordingly.
def test_context_hook_negative_threshold_raises() -> None:
"""Negative thresholds are rejected at factory time."""
mgr = ContextManager()
with pytest.raises(ValueError, match="firewall_threshold must be >= 0"):
make_context_hook(mgr, firewall_threshold=-1)
| if top_k is not None and top_k < 0: | ||
| raise ValueError(f"top_k must be >= 0, got {top_k}") | ||
|
|
| def discover(query: str) -> list[dict[str, Any]]: | ||
| result = router.route(query) | ||
| ids = result.candidate_ids | ||
| if top_k is not None: | ||
| ids = ids[:top_k] | ||
| out: list[dict[str, Any]] = [] | ||
| for tid in ids: | ||
| try: | ||
| hydrated = catalog.hydrate(tid) | ||
| except (ItemNotFoundError, CatalogError): | ||
| # Candidate id not in catalog (graph-only node, e.g. category | ||
| # label). Skip rather than fail — the discovery hook must | ||
| # never reject a valid query just because one node is | ||
| # virtual. | ||
| continue | ||
| out.append( |
| { | ||
| "name": hydrated.item.name, | ||
| "description": hydrated.item.description, | ||
| "input_schema": dict(hydrated.args_schema), | ||
| } | ||
| ) |
| A pure callable ``(query, raw_result) -> str``. *query* is recorded | ||
| as the parent user-turn id for dependency closure; *raw_result* is | ||
| the verbatim tool output. Returns the firewall summary (or the raw | ||
| result if below threshold). |
| # Module-level importorskip: when the optional ``fastmcp`` extra is not | ||
| # installed (matrix runners that only pull ``[dev]``), skip this entire | ||
| # file rather than fail at collection. The unit tests in test_adapters.py | ||
| # still run, so coverage of the adapter does not regress. |
| def test_discovery_tool_negative_top_k_raises() -> None: | ||
| """Negative ceilings are rejected at factory time, not at call time.""" | ||
| catalog = _build_test_catalog() | ||
| router = _build_test_router(catalog) | ||
| with pytest.raises(ValueError, match="top_k must be >= 0"): | ||
| make_discovery_tool(router, catalog, top_k=-1) | ||
|
|
Lands the 3-issue "adoption demonstrations" group selected by the triage
pass. Single combined PR (Mode B, owner-authorised). Same blast radius:
two new reference architectures under examples/architectures/ + two new
adapter callable factories in adapters/fastmcp.py + paired docs and tests.
Zero changes to the context or routing core pipelines.
#87 — FastMCP CodeMode discovery / context hooks
Callable[[str], list[dict]] and Callable[[str, str], str]
respectively, matching the FastMCP CodeMode hook contract
(Coordination: CodeMode vs external context/routing strategies (contextweaver) PrefectHQ/fastmcp#3365) but framework-
agnostic — neither captures any FastMCP reference at runtime.
Catalog into a "given a query, return a shortlist of tools" callable;
graph-only nodes are skipped silently.
the firewall as a "(query, raw_result) → summary" callable; raw bytes
park in the artifact store, query is recorded as item.metadata for
trace correlation.
shortlist with ~86% token reduction.
tests in tests/test_adapters_fastmcp_discovery.py (spins up
fastmcp.FastMCP in-memory server, round-trips through the hook).
integration test runs on every CI matrix cell.
#204 — Code-review bot reference architecture
(24 tools across grep/git/lint/typecheck/test/review) + README.md +
OUTPUT.md + init.py.
firewall is the load-bearing pattern: synthetic ~28 KB diff dump and
~2.5 KB grep result both compact to ~500-char summaries while raw
bytes stay addressable.
deterministic invariants (catalog size, intent matches, firewall
fires=2/6, artifact count=6, fact keys).
#205 — Voice agent reference architecture (Pipecat)
(18 tools across support/orders/shipping/account/callback) +
README.md + OUTPUT.md + init.py.
context build runs via asyncio.to_thread(mgr.build_sync, ...);
ContextBudget(route=200, call=500, interpret=400, answer=1000)
enforces sub-300 ms TTS-friendly answer prompts (max 200 tokens at
five turns).
pyproject.toml grows a [voice] extra (pipecat-ai>=0.0.50) for users
who want the real FrameProcessor.
async-build marker, intent matches, fact keys, and the 400-token
answer-prompt ceiling. Wall-clock timings are not asserted on.
callout pointing back at the architecture.
Shared bookkeeping
to importlib.util.spec_from_file_location so the three architecture
test files can coexist in one pytest run (a bare
import mainfromsys.path collides across architectures).
Module-size note
adapters/fastmcp.py lands at 428 lines, over the soft 300-line guide,
in line with adapters/mcp.py (401) and adapters/proxy_runtime.py (462)
precedent. Mode B authorised the modest overrun rather than splitting
one adapter across two files.
Verification
ruff format --check src/ tests/ examples/ scripts/ → 145 files clean
ruff check src/ tests/ examples/ scripts/ → All checks passed
mypy src/ → 0 issues / 64 files
pytest --cov=contextweaver -q → 985 passed, 5 skipped
(+34 new tests)
make example → all 14 scripts ran
make demo → clean
make scorecard-check → clean (no benchmark drift)
make llms-check → up to date
Closes #87
Closes #204
Closes #205
https://claude.ai/code/session_01JiR8ZGtuwn7Cv2ahHMwLhL