Skip to content

feat(examples,adapters,docs): code-review bot + voice agent architectures + FastMCP CodeMode hooks (#87, #204, #205)#233

Open
dgenio wants to merge 1 commit into
mainfrom
claude/triage-issues-uF9JN
Open

feat(examples,adapters,docs): code-review bot + voice agent architectures + FastMCP CodeMode hooks (#87, #204, #205)#233
dgenio wants to merge 1 commit into
mainfrom
claude/triage-issues-uF9JN

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 16, 2026

Lands the 3-issue "adoption demonstrations" group selected by the triage
pass. Single combined PR (Mode B, owner-authorised). Same blast radius:
two new reference architectures under examples/architectures/ + two new
adapter callable factories in adapters/fastmcp.py + paired docs and tests.
Zero changes to the context or routing core pipelines.

#87 — FastMCP CodeMode discovery / context hooks

  • adapters/fastmcp.py grows two factories that return plain
    Callable[[str], list[dict]] and Callable[[str, str], str]
    respectively, matching the FastMCP CodeMode hook contract
    (Coordination: CodeMode vs external context/routing strategies (contextweaver) PrefectHQ/fastmcp#3365) but framework-
    agnostic — neither captures any FastMCP reference at runtime.
  • make_discovery_tool(router, catalog, *, top_k=None) wraps Router +
    Catalog into a "given a query, return a shortlist of tools" callable;
    graph-only nodes are skipped silently.
  • make_context_hook(context_manager, *, firewall_threshold=2000) wraps
    the firewall as a "(query, raw_result) → summary" callable; raw bytes
    park in the artifact store, query is recorded as item.metadata for
    trace correlation.
  • examples/fastmcp_discovery_demo.py demos a 22-tool catalog → 3-tool
    shortlist with ~86% token reduction.
  • 12 unit tests under tests/test_adapters.py + 2 real-FastMCP integration
    tests in tests/test_adapters_fastmcp_discovery.py (spins up
    fastmcp.FastMCP in-memory server, round-trips through the hook).
  • fastmcp>=2.0 moved from [fastmcp] runtime extra into [dev] so the
    integration test runs on every CI matrix cell.

#204 — Code-review bot reference architecture

  • examples/architectures/code_review_bot/ with main.py + catalog.yaml
    (24 tools across grep/git/lint/typecheck/test/review) + README.md +
    OUTPUT.md + init.py.
  • Six-step PR review walking a regression in payments/charge.py. The
    firewall is the load-bearing pattern: synthetic ~28 KB diff dump and
    ~2.5 KB grep result both compact to ~500-char summaries while raw
    bytes stay addressable.
  • 10 smoke tests in tests/test_architectures_code_review.py pin
    deterministic invariants (catalog size, intent matches, firewall
    fires=2/6, artifact count=6, fact keys).
  • docs/architectures/code_review_bot.md is the public docs page.

#205 — Voice agent reference architecture (Pipecat)

  • examples/architectures/voice_agent/ with main.py + catalog.yaml
    (18 tools across support/orders/shipping/account/callback) +
    README.md + OUTPUT.md + init.py.
  • Canonical worked example for docs/integration_pipecat.md: every
    context build runs via asyncio.to_thread(mgr.build_sync, ...);
    ContextBudget(route=200, call=500, interpret=400, answer=1000)
    enforces sub-300 ms TTS-friendly answer prompts (max 200 tokens at
    five turns).
  • Pipecat optional: example runs end-to-end without pipecat-ai.
    pyproject.toml grows a [voice] extra (pipecat-ai>=0.0.50) for users
    who want the real FrameProcessor.
  • 10 smoke tests in tests/test_architectures_voice.py pin the
    async-build marker, intent matches, fact keys, and the 400-token
    answer-prompt ceiling. Wall-clock timings are not asserted on.
  • docs/architectures/voice_agent.md is the public docs page.
  • docs/integration_pipecat.md gains a "Canonical worked example"
    callout pointing back at the architecture.

Shared bookkeeping

  • Makefile architectures target runs all three architectures.
  • mkdocs.yml + docs/architectures/index.md link the two new pages.
  • CHANGELOG.md gains three bullets under [Unreleased].
  • tests/test_architectures_slack.py refactored from sys.path injection
    to importlib.util.spec_from_file_location so the three architecture
    test files can coexist in one pytest run (a bare import main from
    sys.path collides across architectures).

Module-size note
adapters/fastmcp.py lands at 428 lines, over the soft 300-line guide,
in line with adapters/mcp.py (401) and adapters/proxy_runtime.py (462)
precedent. Mode B authorised the modest overrun rather than splitting
one adapter across two files.

Verification
ruff format --check src/ tests/ examples/ scripts/ → 145 files clean
ruff check src/ tests/ examples/ scripts/ → All checks passed
mypy src/ → 0 issues / 64 files
pytest --cov=contextweaver -q → 985 passed, 5 skipped
(+34 new tests)
make example → all 14 scripts ran
make demo → clean
make scorecard-check → clean (no benchmark drift)
make llms-check → up to date

Closes #87
Closes #204
Closes #205

https://claude.ai/code/session_01JiR8ZGtuwn7Cv2ahHMwLhL

…ures + FastMCP CodeMode hooks (#87, #204, #205)

Lands the 3-issue "adoption demonstrations" group selected by the triage
pass. Single combined PR (Mode B, owner-authorised). Same blast radius:
two new reference architectures under examples/architectures/ + two new
adapter callable factories in adapters/fastmcp.py + paired docs and tests.
Zero changes to the context or routing core pipelines.

#87 — FastMCP CodeMode discovery / context hooks
  - adapters/fastmcp.py grows two factories that return plain
    Callable[[str], list[dict]] and Callable[[str, str], str]
    respectively, matching the FastMCP CodeMode hook contract
    (PrefectHQ/fastmcp#3365) but framework-
    agnostic — neither captures any FastMCP reference at runtime.
  - make_discovery_tool(router, catalog, *, top_k=None) wraps Router +
    Catalog into a "given a query, return a shortlist of tools" callable;
    graph-only nodes are skipped silently.
  - make_context_hook(context_manager, *, firewall_threshold=2000) wraps
    the firewall as a "(query, raw_result) → summary" callable; raw bytes
    park in the artifact store, query is recorded as item.metadata for
    trace correlation.
  - examples/fastmcp_discovery_demo.py demos a 22-tool catalog → 3-tool
    shortlist with ~86% token reduction.
  - 12 unit tests under tests/test_adapters.py + 2 real-FastMCP integration
    tests in tests/test_adapters_fastmcp_discovery.py (spins up
    fastmcp.FastMCP in-memory server, round-trips through the hook).
  - fastmcp>=2.0 moved from [fastmcp] runtime extra into [dev] so the
    integration test runs on every CI matrix cell.

#204 — Code-review bot reference architecture
  - examples/architectures/code_review_bot/ with main.py + catalog.yaml
    (24 tools across grep/git/lint/typecheck/test/review) + README.md +
    OUTPUT.md + __init__.py.
  - Six-step PR review walking a regression in payments/charge.py. The
    firewall is the load-bearing pattern: synthetic ~28 KB diff dump and
    ~2.5 KB grep result both compact to ~500-char summaries while raw
    bytes stay addressable.
  - 10 smoke tests in tests/test_architectures_code_review.py pin
    deterministic invariants (catalog size, intent matches, firewall
    fires=2/6, artifact count=6, fact keys).
  - docs/architectures/code_review_bot.md is the public docs page.

#205 — Voice agent reference architecture (Pipecat)
  - examples/architectures/voice_agent/ with main.py + catalog.yaml
    (18 tools across support/orders/shipping/account/callback) +
    README.md + OUTPUT.md + __init__.py.
  - Canonical worked example for docs/integration_pipecat.md: every
    context build runs via asyncio.to_thread(mgr.build_sync, ...);
    ContextBudget(route=200, call=500, interpret=400, answer=1000)
    enforces sub-300 ms TTS-friendly answer prompts (max 200 tokens at
    five turns).
  - Pipecat optional: example runs end-to-end without pipecat-ai.
    pyproject.toml grows a [voice] extra (pipecat-ai>=0.0.50) for users
    who want the real FrameProcessor.
  - 10 smoke tests in tests/test_architectures_voice.py pin the
    async-build marker, intent matches, fact keys, and the 400-token
    answer-prompt ceiling. Wall-clock timings are not asserted on.
  - docs/architectures/voice_agent.md is the public docs page.
  - docs/integration_pipecat.md gains a "Canonical worked example"
    callout pointing back at the architecture.

Shared bookkeeping
  - Makefile architectures target runs all three architectures.
  - mkdocs.yml + docs/architectures/index.md link the two new pages.
  - CHANGELOG.md gains three bullets under [Unreleased].
  - tests/test_architectures_slack.py refactored from sys.path injection
    to importlib.util.spec_from_file_location so the three architecture
    test files can coexist in one pytest run (a bare ``import main`` from
    sys.path collides across architectures).

Module-size note
  adapters/fastmcp.py lands at 428 lines, over the soft 300-line guide,
  in line with adapters/mcp.py (401) and adapters/proxy_runtime.py (462)
  precedent. Mode B authorised the modest overrun rather than splitting
  one adapter across two files.

Verification
  ruff format --check src/ tests/ examples/ scripts/   → 145 files clean
  ruff check src/ tests/ examples/ scripts/            → All checks passed
  mypy src/                                            → 0 issues / 64 files
  pytest --cov=contextweaver -q                        → 985 passed, 5 skipped
                                                         (+34 new tests)
  make example                                         → all 14 scripts ran
  make demo                                            → clean
  make scorecard-check                                 → clean (no benchmark drift)
  make llms-check                                      → up to date

Closes #87
Closes #204
Closes #205

https://claude.ai/code/session_01JiR8ZGtuwn7Cv2ahHMwLhL
Copilot AI review requested due to automatic review settings May 16, 2026 18:31
@github-actions
Copy link
Copy Markdown

Benchmark delta (vs main)

Soft regression feedback only — this comment never blocks the PR.
Latency budget: ⚠️ when head > base × 1.3. Accuracy budget: ⚠️ when head < base - 1pp.

Routing summary (single backend × catalog sizes)

size recall@k (head Δ vs base) MRR (head Δ vs base) p99 (ms)
50 ✅ 0.5649 (+0.0000) ✅ 0.4978 (+0.0000) ✅ 0.570 (base 0.463)
83 ✅ 0.3825 (+0.0000) ✅ 0.3242 (+0.0000) ✅ 0.693 (base 0.876)
1000 ✅ 0.1475 (+0.0000) ✅ 0.1456 (+0.0000) ✅ 36.009 (base 31.897)

Per-backend × per-size matrix

backend size recall@k (Δ) MRR (Δ) p99 (ms)
bm25 100 ✅ 0.3825 (+0.0000) ✅ 0.3399 (+0.0000) ✅ 5.875 (base 5.642)
bm25 500 ✅ 0.2250 (+0.0000) ✅ 0.2165 (+0.0000) ✅ 31.481 (base 27.538)
bm25 1000 ✅ 0.1575 (+0.0000) ✅ 0.1525 (+0.0000) ✅ 85.894 (base 78.368)
fuzzy 100 ✅ 0.0000 (+0.0000) ✅ 0.0000 (+0.0000) ✅ 0.000 (base 0.000)
fuzzy 500 ✅ 0.0000 (+0.0000) ✅ 0.0000 (+0.0000) ✅ 0.000 (base 0.000)
fuzzy 1000 ✅ 0.0000 (+0.0000) ✅ 0.0000 (+0.0000) ✅ 0.000 (base 0.000)
tfidf 100 ✅ 0.3825 (+0.0000) ✅ 0.3220 (+0.0000) ✅ 0.995 (base 0.872)
tfidf 500 ✅ 0.2325 (+0.0000) ✅ 0.2314 (+0.0000) ✅ 9.136 (base 8.660)
tfidf 1000 ✅ 0.1475 (+0.0000) ✅ 0.1456 (+0.0000) ✅ 35.933 (base 30.071)

Context pipeline (per scenario)

scenario tokens dropped dedup
large_catalog 1514 (base 1514, Δ+0) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
long_conversation 2548 (base 2548, Δ+0) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
short_conversation 496 (base 496, Δ+0) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
stress_conversation 6651 (base 6651, Δ+0) 7 (base 7, Δ+0) 4 (base 4, Δ+0)

Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds three “adoption demonstration” deliverables to the repo: (1) FastMCP CodeMode hook factories in the FastMCP adapter, and (2) two new runnable reference architectures (code-review bot + voice agent) with smoke tests and public docs, all wired into the examples/architectures and docs navigation.

Changes:

  • Add make_discovery_tool() and make_context_hook() factories to contextweaver.adapters.fastmcp and re-export them from contextweaver.adapters.
  • Introduce two new reference architectures under examples/architectures/ (code_review_bot, voice_agent) with deterministic outputs, tests, and docs pages.
  • Wire the new examples/docs into Makefile, mkdocs.yml, docs/architectures/index.md, pyproject.toml extras, and CHANGELOG.md.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_architectures_voice.py Adds smoke tests for the new voice agent architecture by running the script and asserting deterministic invariants.
tests/test_architectures_slack.py Refactors Slack architecture smoke test import to use importlib.util.spec_from_file_location to avoid module-name collisions.
tests/test_architectures_code_review.py Adds smoke tests for the new code-review bot architecture by running the script and asserting deterministic invariants.
tests/test_adapters.py Adds unit tests for FastMCP CodeMode hook factories (make_discovery_tool, make_context_hook).
tests/test_adapters_fastmcp_discovery.py Adds real FastMCP integration coverage for the discovery hook against an in-memory FastMCP server.
src/contextweaver/adapters/fastmcp.py Adds CodeMode-style hook factories for tool discovery + firewall-based context compaction.
src/contextweaver/adapters/init.py Re-exports the new FastMCP hook factories from the adapters package.
pyproject.toml Adds fastmcp to [dev] and introduces a new [voice] optional extra for pipecat-ai.
mkdocs.yml Adds the new architectures pages to the docs nav.
Makefile Adds the FastMCP demo to make example and the two new architectures to make architectures.
examples/fastmcp_discovery_demo.py Adds a standalone demo showing catalog → shortlist compression and firewall compaction via the new hook callables.
examples/architectures/voice_agent/README.md Documents the voice agent architecture, its budgets, and the async build pattern (asyncio.to_thread).
examples/architectures/voice_agent/OUTPUT.md Captured deterministic output for the voice agent architecture.
examples/architectures/voice_agent/main.py Implements the voice agent reference architecture (mock tools, async builds, tight budgets, fact persistence).
examples/architectures/voice_agent/catalog.yaml Provides the 18-tool catalog used by the voice agent architecture.
examples/architectures/voice_agent/init.py Adds a minimal package marker for the voice agent architecture directory.
examples/architectures/code_review_bot/README.md Documents the code-review bot architecture emphasizing firewall + artifact store behavior.
examples/architectures/code_review_bot/OUTPUT.md Captured deterministic output for the code-review bot architecture.
examples/architectures/code_review_bot/main.py Implements the code-review bot reference architecture (mock tools, firewall-heavy steps, fact persistence).
examples/architectures/code_review_bot/catalog.yaml Provides the 24-tool catalog used by the code-review bot architecture.
examples/architectures/code_review_bot/init.py Adds a minimal package marker for the code-review bot architecture directory.
docs/integration_pipecat.md Adds a callout pointing readers to the voice agent as the canonical worked example.
docs/architectures/voice_agent.md Adds the public docs page for the voice agent architecture.
docs/architectures/index.md Updates the architectures index to list Slack ops bot, code-review bot, and voice agent.
docs/architectures/code_review_bot.md Adds the public docs page for the code-review bot architecture.
CHANGELOG.md Documents the additions (FastMCP hooks + two architectures) under Unreleased.
Comments suppressed due to low confidence (2)

src/contextweaver/adapters/fastmcp.py:410

  • The factories raise bare ValueError for invalid firewall_threshold, but AGENTS.md requires using contextweaver.exceptions for errors (no bare ValueError/RuntimeError). Consider raising ConfigError (or another ContextWeaverError subclass) instead and updating the corresponding tests.
    if firewall_threshold < 0:
        raise ValueError(f"firewall_threshold must be >= 0, got {firewall_threshold}")

tests/test_adapters.py:1347

  • These tests assert ValueError for invalid firewall_threshold, but AGENTS.md establishes a repo convention to raise ContextWeaverError subclasses (no bare ValueError/RuntimeError). Once make_context_hook switches to ConfigError (or similar), update this expectation accordingly.
def test_context_hook_negative_threshold_raises() -> None:
    """Negative thresholds are rejected at factory time."""
    mgr = ContextManager()
    with pytest.raises(ValueError, match="firewall_threshold must be >= 0"):
        make_context_hook(mgr, firewall_threshold=-1)

Comment on lines +348 to +350
if top_k is not None and top_k < 0:
raise ValueError(f"top_k must be >= 0, got {top_k}")

Comment on lines +351 to +366
def discover(query: str) -> list[dict[str, Any]]:
result = router.route(query)
ids = result.candidate_ids
if top_k is not None:
ids = ids[:top_k]
out: list[dict[str, Any]] = []
for tid in ids:
try:
hydrated = catalog.hydrate(tid)
except (ItemNotFoundError, CatalogError):
# Candidate id not in catalog (graph-only node, e.g. category
# label). Skip rather than fail — the discovery hook must
# never reject a valid query just because one node is
# virtual.
continue
out.append(
Comment on lines +367 to +372
{
"name": hydrated.item.name,
"description": hydrated.item.description,
"input_schema": dict(hydrated.args_schema),
}
)
Comment on lines +400 to +403
A pure callable ``(query, raw_result) -> str``. *query* is recorded
as the parent user-turn id for dependency closure; *raw_result* is
the verbatim tool output. Returns the firewall summary (or the raw
result if below threshold).
Comment on lines +21 to +24
# Module-level importorskip: when the optional ``fastmcp`` extra is not
# installed (matrix runners that only pull ``[dev]``), skip this entire
# file rather than fail at collection. The unit tests in test_adapters.py
# still run, so coverage of the adapter does not regress.
Comment thread tests/test_adapters.py
Comment on lines +1256 to +1262
def test_discovery_tool_negative_top_k_raises() -> None:
"""Negative ceilings are rejected at factory time, not at call time."""
catalog = _build_test_catalog()
router = _build_test_router(catalog)
with pytest.raises(ValueError, match="top_k must be >= 0"):
make_discovery_tool(router, catalog, top_k=-1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants