feat(routing,context,adapters): explicit pipeline + embedding backend + history-aware routing (#8, #27, #56)#236
feat(routing,context,adapters): explicit pipeline + embedding backend + history-aware routing (#8, #27, #56)#236dgenio wants to merge 1 commit into
Conversation
… + history-aware routing (#8, #27, #56) #56 — Routing pipeline decomposition - New routing/pipeline.py composer with explicit stages: retrieve -> rerank -> navigate -> pack - New Navigator + CardPacker protocols in protocols.py - BeamSearchNavigator lifted verbatim from router.py (byte-identical default behaviour; verified by the existing 50+ router regression tests and `make scorecard-check`) - DefaultCardPacker wraps make_choice_cards with a soft budget_tokens cap - Router.route() now delegates to RoutingPipeline.navigate(); the public API surface and every RouteResult field are preserved #8 — Optional embedding-based retrieval backend - New EmbeddingBackend protocol in protocols.py - New [embeddings] extra (`pip install 'contextweaver[embeddings]'`) - New extras/embeddings.py: SentenceTransformerBackend + the HybridEmbeddingRetriever (70/30 embedding+TF-IDF weighted sum so lexical exact-id / exact-tag hits keep their floor) - Router(embedding_backend=...) constructs the hybrid retriever via a lazy import so the core install never pulls torch - Mock HashEmbeddingBackend in tests provides a deterministic default-install test path; real sentence-transformers integration test is gated behind pytest.importorskip #27 — History-aware re-routing + tool-dependency metadata - Phase 1: new routing/history.py with RouteHistory dataclass + adjust_scores helper. Router.route(..., history=RouteHistory(...)) applies a repeat penalty to already-called tools and boosts candidates whose description resembles last_result_summary (computed via the router's fitted retriever, so the boost is in the same scoring space as the primary query). Per-item deltas surface on the new RouteResult.history_adjustments field + trace.extra. - Phase 2: SelectableItem gains optional depends_on / provides / requires fields; adjust_scores applies a satisfaction boost when a candidate's `requires` are fully covered by `provides` of already- called tools, and a penalty when `depends_on` references an uncalled tool. All three default to None and round-trip omitted when unset. - ContextManager.build_route_prompt auto-constructs a RouteHistory from the event log (tools whose tool_result is in the log). Set history_from_log=False or pass history=... explicitly to opt out / override. - Catalog.validate_dependencies() returns human-readable warnings for depends_on entries pointing at unknown tool ids. Schemas + extras - schemas/catalog.schema.json + docs/schemas/v0/catalog.schema.json regenerated (3 new nullable array fields on items). - pyproject.toml adds [embeddings] extra and a sentence_transformers mypy override. Verification ruff format --check src/ tests/ examples/ scripts/ -> clean ruff check src/ tests/ examples/ scripts/ -> clean mypy src/ -> 0 issues / 79 files pytest -q -> 1111 passed, 6 skipped (+70 new tests) scripts/gen_schemas.py --check -> schemas up to date scripts/render_scorecard.py --check -> exit 0 (no drift) make example -> all scripts clean make demo -> Demo complete make llms-check -> up to date Closes #8 Closes #27 Closes #56 https://claude.ai/code/session_017YLnTSUmEXLXV85JC29oYf
Benchmark delta (vs
|
| size | recall@k (head Δ vs base) | MRR (head Δ vs base) | p99 (ms) |
|---|---|---|---|
| 50 | ✅ 0.5649 (+0.0000) | ✅ 0.4978 (+0.0000) | ✅ 0.503 (base 0.463) |
| 83 | ✅ 0.3825 (+0.0000) | ✅ 0.3242 (+0.0000) | ✅ 0.723 (base 0.876) |
| 1000 | ✅ 0.1475 (+0.0000) | ✅ 0.1456 (+0.0000) | ✅ 38.163 (base 31.897) |
Per-backend × per-size matrix
| backend | size | recall@k (Δ) | MRR (Δ) | p99 (ms) |
|---|---|---|---|---|
| bm25 | 100 | ✅ 0.3825 (+0.0000) | ✅ 0.3399 (+0.0000) | ✅ 6.386 (base 5.642) |
| bm25 | 500 | ✅ 0.2250 (+0.0000) | ✅ 0.2165 (+0.0000) | ✅ 31.224 (base 27.538) |
| bm25 | 1000 | ✅ 0.1575 (+0.0000) | ✅ 0.1525 (+0.0000) | ✅ 86.143 (base 78.368) |
| fuzzy | 100 | ✅ 0.0000 (+0.0000) | ✅ 0.0000 (+0.0000) | ✅ 0.000 (base 0.000) |
| fuzzy | 500 | ✅ 0.0000 (+0.0000) | ✅ 0.0000 (+0.0000) | ✅ 0.000 (base 0.000) |
| fuzzy | 1000 | ✅ 0.0000 (+0.0000) | ✅ 0.0000 (+0.0000) | ✅ 0.000 (base 0.000) |
| tfidf | 100 | ✅ 0.3825 (+0.0000) | ✅ 0.3220 (+0.0000) | ✅ 1.055 (base 0.872) |
| tfidf | 500 | ✅ 0.2325 (+0.0000) | ✅ 0.2314 (+0.0000) | ✅ 9.815 (base 8.660) |
| tfidf | 1000 | ✅ 0.1475 (+0.0000) | ✅ 0.1456 (+0.0000) | ✅ 38.649 (base 30.071) |
Context pipeline (per scenario)
| scenario | tokens | dropped | dedup |
|---|---|---|---|
| large_catalog | 1514 (base 1514, Δ+0) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| long_conversation | 2548 (base 2548, Δ+0) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| short_conversation | 496 (base 496, Δ+0) | 0 (base 0, Δ+0) | 0 (base 0, Δ+0) |
| stress_conversation | 6651 (base 6651, Δ+0) | 7 (base 7, Δ+0) | 4 (base 4, Δ+0) |
Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.
There was a problem hiding this comment.
Pull request overview
This PR advances the routing engine by making routing stages explicit and swappable (pipeline composer + navigator + packer), adding an optional embedding-based retriever behind an extra, and introducing history-aware routing adjustments plus dependency metadata on SelectableItem. It also wires ContextManager.build_route_prompt() to optionally auto-construct routing history from the event log and updates schemas/docs/tests accordingly.
Changes:
- Introduce explicit routing pipeline components (
RoutingPipeline,BeamSearchNavigator,DefaultCardPacker) and refactorRouterto delegate navigation via the pipeline. - Add optional embedding retrieval via
[embeddings]extra (SentenceTransformerBackend,HybridEmbeddingRetriever) and expose anEmbeddingBackendprotocol. - Add history-aware score adjustments (
RouteHistory,adjust_scores) plus dependency metadata fields (depends_on/provides/requires) and catalog validation warnings.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_types.py | Adds round-trip + default/omit tests for new dependency metadata fields on SelectableItem. |
| tests/test_router.py | Adds regression tests for pipeline delegation and history-aware routing adjustments. |
| tests/test_pipeline.py | Adds tests for RoutingPipeline construction and stage-level entry points. |
| tests/test_packer.py | Adds tests for DefaultCardPacker ordering and budget behavior. |
| tests/test_navigator.py | Adds tests for BeamSearchNavigator determinism and eligibility behavior. |
| tests/test_manager.py | Adds tests for ContextManager.build_route_prompt() history-from-log behavior and overrides. |
| tests/test_history.py | Adds unit tests for RouteHistory serialization and adjust_scores rules. |
| tests/test_extras_embeddings.py | Adds deterministic mock backend tests and gated sentence-transformers integration tests. |
| tests/test_catalog.py | Adds tests for Catalog.validate_dependencies() warnings. |
| src/contextweaver/types.py | Extends SelectableItem with optional dependency metadata + serialization behavior. |
| src/contextweaver/routing/router.py | Adds pipeline support, embedding backend option, and history-aware adjustments to routing results. |
| src/contextweaver/routing/pipeline.py | Introduces RoutingPipeline composer + from_config factory and stage entry points. |
| src/contextweaver/routing/packer.py | Introduces DefaultCardPacker and a soft token-budget cap for card lists. |
| src/contextweaver/routing/navigator.py | Extracts beam-search navigation into a standalone, protocol-driven navigator. |
| src/contextweaver/routing/history.py | Adds RouteHistory dataclass and deterministic history-based score adjustment logic. |
| src/contextweaver/routing/catalog.py | Adds dependency reference validation (depends_on) and warnings. |
| src/contextweaver/protocols.py | Adds Navigator, CardPacker, EmbeddingBackend protocols + NavigationResult. |
| src/contextweaver/extras/embeddings.py | Implements optional sentence-transformers backend + hybrid embedding/TF-IDF retriever. |
| src/contextweaver/extras/init.py | Documents the new embeddings extra in the extras package overview. |
| src/contextweaver/context/manager.py | Adds history / history_from_log to build_route_prompt() + constructs history from the event log. |
| src/contextweaver/init.py | Re-exports new routing/public surface types (pipeline, navigator, packer, history, embedding protocol). |
| schemas/catalog.schema.json | Regenerates catalog schema with new nullable dependency metadata fields. |
| docs/schemas/v0/catalog.schema.json | Regenerates published v0 schema mirror with dependency metadata fields. |
| pyproject.toml | Adds [embeddings] optional dependency group and mypy override for sentence-transformers. |
| CHANGELOG.md | Documents the new pipeline, embeddings backend, and history/dependency routing features. |
| AGENTS.md | Updates module map and routing pipeline description to include the new routing modules/features. |
| items = self._event_log.all() | ||
| tool_results = [i for i in items if i.kind == ItemKind.tool_result] | ||
| if not tool_results: | ||
| return None | ||
| called_ids: list[str] = [] | ||
| seen: set[str] = set() | ||
| for item in tool_results: | ||
| tid = item.parent_id or item.id | ||
| if tid in seen: | ||
| continue | ||
| seen.add(tid) | ||
| called_ids.append(tid) | ||
| last = tool_results[-1] | ||
| summary = (last.text or "")[:500] or None | ||
| return _RouteHistory( | ||
| called_tool_ids=called_ids, | ||
| last_result_summary=summary, | ||
| step_number=len(called_ids) + 1, | ||
| ) |
| log.append( | ||
| ContextItem( | ||
| id="tc1", | ||
| parent_id=None, | ||
| kind=ItemKind.tool_call, | ||
| text="db_read invoked", | ||
| ) | ||
| ) | ||
| # The tool_result.parent_id is the tool call's id but we expose called | ||
| # tool ids via that — see the _build_route_history_from_log helper. | ||
| log.append( | ||
| ContextItem( | ||
| id="tr1", | ||
| parent_id="db_read", | ||
| kind=ItemKind.tool_result, | ||
| text="rows: id, name, email", | ||
| ) | ||
| ) |
| @@ -342,6 +367,8 @@ def __init__( | |||
| routing_config: RoutingConfig | None = None, | |||
| retriever: Retriever | None = None, | |||
| engine_registry: EngineRegistry | None = None, | |||
| embedding_backend: EmbeddingBackend | None = None, | |||
| pipeline: RoutingPipeline | None = None, | |||
| ) -> None: | |||
| if routing_config is not None: | |||
| beam_width = routing_config.beam_width | |||
| @@ -355,6 +382,12 @@ def __init__( | |||
| f"Unknown scorer_backend {scorer_backend!r}; " | |||
| f"valid options: {sorted(_SCORER_BACKENDS)}" | |||
| ) | |||
| if embedding_backend is not None and retriever is not None: | |||
| raise ConfigError( | |||
| "Pass either retriever= or embedding_backend=, not both. " | |||
| "Construct an embedding-aware Retriever and pass it via retriever= " | |||
| "if you need both signals combined under a custom policy." | |||
| ) | |||
| self._graph = graph | |||
| self._beam_width = beam_width | |||
| self._max_depth = max_depth | |||
| @@ -366,6 +399,14 @@ def __init__( | |||
| if retriever is not None: | |||
| self._retriever: Retriever = retriever | |||
| self._retriever_engine_name = self._engine_registry.default_for("retriever") or "tfidf" | |||
| elif embedding_backend is not None: | |||
| # Late import keeps the core install free of any sentence- | |||
| # transformers / hnswlib / torch dependency. Importing the | |||
| # adapter only happens when a backend is actually supplied. | |||
| from contextweaver.extras.embeddings import HybridEmbeddingRetriever | |||
|
|
|||
| self._retriever = HybridEmbeddingRetriever(embedding_backend) | |||
| self._retriever_engine_name = "embedding+tfidf" | |||
| elif scorer is not None: | |||
| self._retriever = _ScorerRetriever(scorer) | |||
| self._retriever_engine_name = "tfidf" | |||
| @@ -377,9 +418,42 @@ def __init__( | |||
| self._retriever_engine_name = self._engine_registry.default_for("retriever") or "tfidf" | |||
| self._indexed = False | |||
| self._doc_id_to_idx: dict[str, int] = {} | |||
| self._pipeline = self._build_pipeline(pipeline) | |||
| if items is not None: | |||
| self.set_items(items) | |||
|
|
|||
| def _build_pipeline(self, override: RoutingPipeline | None) -> RoutingPipeline: | |||
| """Construct the routing pipeline (issue #56). | |||
|
|
|||
| When *override* is supplied, its navigator / packer / reranker | |||
| replace the bundled defaults; the retriever is always set to the | |||
| one this :class:`Router` already resolved so corpus indexing has | |||
| a single source of truth. | |||
| """ | |||
| navigator = BeamSearchNavigator( | |||
| beam_width=self._beam_width, | |||
| max_depth=self._max_depth, | |||
| top_k=self._top_k, | |||
| confidence_gap=self._confidence_gap, | |||
| ) | |||
| if override is None: | |||
| return RoutingPipeline( | |||
| retriever=self._retriever, | |||
| reranker=None, | |||
| navigator=navigator, | |||
| ) | |||
| return RoutingPipeline( | |||
| retriever=self._retriever, | |||
| reranker=override.reranker, | |||
| navigator=override.navigator or navigator, | |||
| packer=override.packer, | |||
| ) | |||
| def _result_similarity_map( | ||
| self, | ||
| collected: dict[str, tuple[float, list[str]]], | ||
| active_items: dict[str, SelectableItem], | ||
| ) -> list[tuple[str, tuple[float, list[str]]]]: | ||
| """Return *collected* sorted by ``(-score, id)``, untrimmed. | ||
|
|
||
| Truncation to ``self._top_k`` is the caller's responsibility so | ||
| ambiguity / runner-up reads can use the full ranking even when | ||
| ``top_k=1`` (issue #14). | ||
| history: RouteHistory, | ||
| scored: list[tuple[str, float]], | ||
| ) -> dict[str, float] | None: | ||
| """Per-candidate similarity to ``history.last_result_summary``. | ||
|
|
||
| Reuses the router's fitted retriever so the boost is computed in | ||
| the same scoring space as the primary query. Returns ``None`` when | ||
| the history has no summary so :func:`adjust_scores` can skip the | ||
| boost stage entirely. | ||
| """ | ||
| return sorted( | ||
| (entry for entry in collected.items() if entry[0] in active_items), | ||
| key=lambda x: (-x[1][0], x[0]), | ||
| ) | ||
|
|
||
| def _expand_subtree( | ||
| self, | ||
| query: str, | ||
| node_id: str, | ||
| base_score: float, | ||
| base_path: list[str], | ||
| active_items: dict[str, SelectableItem], | ||
| eligible_internals: set[str], | ||
| *, | ||
| max_depth: int | None = None, | ||
| ) -> dict[str, tuple[float, list[str]]]: | ||
| """Expand children of *node_id* recursively, collecting items. | ||
|
|
||
| Children outside *active_items* (leaves) or *eligible_internals* | ||
| (internals) are skipped before scoring so excluded subtrees do | ||
| not consume backtracking work (issue #112 / #22). | ||
| """ | ||
| depth_limit = max_depth if max_depth is not None else self._max_depth | ||
| result: dict[str, tuple[float, list[str]]] = {} | ||
| stack: list[tuple[float, str, list[str], int]] = [(base_score, node_id, base_path, 0)] | ||
| while stack: | ||
| score, nid, path, depth = stack.pop() | ||
| children = self._graph.successors(nid) | ||
| if not children or depth >= depth_limit: | ||
| if nid in active_items: | ||
| result[nid] = (score, path[1:]) | ||
| summary = history.last_result_summary | ||
| if not summary: | ||
| return None | ||
| sims: dict[str, float] = {} | ||
| for item_id, _ in scored: | ||
| idx = self._doc_id_to_idx.get(item_id) | ||
| if idx is None: | ||
| continue | ||
| for child in sorted(children): | ||
| if not self._is_eligible_child(child, active_items, eligible_internals): | ||
| continue | ||
| s = self._score_node(query, child) | ||
| new_path = path + [child] | ||
| if child in self._items: | ||
| result[child] = (score + s, new_path[1:]) | ||
| else: | ||
| stack.append((score + s, child, new_path, depth + 1)) | ||
| return result | ||
| sims[item_id] = self._retriever.score_one(summary, idx) | ||
| return sims |
| ``embedding_backend=`` argument is supplied. Importing this module | ||
| without the ``sentence-transformers`` dependency raises ``ImportError`` | ||
| with the exact install hint above — matching the convention used by | ||
| :mod:`contextweaver.extras.otel`. |
| def __init__( | ||
| self, | ||
| backend: EmbeddingBackend, | ||
| *, | ||
| embedding_weight: float = 0.7, | ||
| ) -> None: | ||
| if not 0.0 <= embedding_weight <= 1.0: | ||
| raise ValueError(f"embedding_weight must be in [0.0, 1.0], got {embedding_weight}") | ||
| self._backend = backend |
| 2. ``rerank`` — :class:`~contextweaver.protocols.Reranker` re-orders the | ||
| shortlist. Defaults to :class:`NoOpReranker` which leaves order |
| """Pluggable card-rendering stage of the routing pipeline. | ||
|
|
||
| A packer turns a ranked list of items into :class:`ChoiceCard` instances | ||
| (and optionally a rendered text block) within a token budget. The | ||
| bundled default :class:`~contextweaver.routing.packer.DefaultCardPacker` | ||
| wraps :func:`contextweaver.routing.cards.make_choice_cards` + | ||
| :func:`contextweaver.routing.cards.render_cards_text`. |
#56 — Routing pipeline decomposition
retrieve -> rerank -> navigate -> pack
default behaviour; verified by the existing 50+ router regression tests
and
make scorecard-check)API surface and every RouteResult field are preserved
#8 — Optional embedding-based retrieval backend
pip install 'contextweaver[embeddings]')HybridEmbeddingRetriever (70/30 embedding+TF-IDF weighted sum so
lexical exact-id / exact-tag hits keep their floor)
lazy import so the core install never pulls torch
default-install test path; real sentence-transformers integration
test is gated behind pytest.importorskip
#27 — History-aware re-routing + tool-dependency metadata
adjust_scores helper. Router.route(..., history=RouteHistory(...))
applies a repeat penalty to already-called tools and boosts
candidates whose description resembles last_result_summary (computed
via the router's fitted retriever, so the boost is in the same
scoring space as the primary query). Per-item deltas surface on the
new RouteResult.history_adjustments field + trace.extra.
requires fields; adjust_scores applies a satisfaction boost when a
candidate's
requiresare fully covered byprovidesof already-called tools, and a penalty when
depends_onreferences anuncalled tool. All three default to None and round-trip omitted
when unset.
from the event log (tools whose tool_result is in the log). Set
history_from_log=False or pass history=... explicitly to opt out
/ override.
depends_on entries pointing at unknown tool ids.
Schemas + extras
regenerated (3 new nullable array fields on items).
mypy override.
Verification
ruff format --check src/ tests/ examples/ scripts/ -> clean
ruff check src/ tests/ examples/ scripts/ -> clean
mypy src/ -> 0 issues / 79 files
pytest -q -> 1111 passed, 6 skipped
(+70 new tests)
scripts/gen_schemas.py --check -> schemas up to date
scripts/render_scorecard.py --check -> exit 0 (no drift)
make example -> all scripts clean
make demo -> Demo complete
make llms-check -> up to date
Closes #8
Closes #27
Closes #56
https://claude.ai/code/session_017YLnTSUmEXLXV85JC29oYf