Skip to content

feat(cli,context,routing,observability,contracts): adopter-inspection surfaces (#106, #221, #224, #225, #226)#231

Merged
dgenio merged 4 commits into
mainfrom
claude/triage-issues-DqkjR
May 17, 2026
Merged

feat(cli,context,routing,observability,contracts): adopter-inspection surfaces (#106, #221, #224, #225, #226)#231
dgenio merged 4 commits into
mainfrom
claude/triage-issues-DqkjR

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 16, 2026

Lands the 5-issue "decision-surface inspectability" group selected by the
triage pass. Shared blast radius: envelope.py, main.py,
routing/router.py + new explanation.py, extras/otel.py, new schemas/
directory. Owner-mode (Mode B) authorised; bumps version to 0.5.0 with
documented public-API deltas under CHANGELOG ## [Unreleased].

#221 — argparse → Typer + Rich rewrite of main.py

  • typer>=0.9 + rich>=13.0 promoted from [cli] extra to core deps
  • [cli] extra kept as empty alias for one cycle (removal: v0.6)
  • 7 existing subcommands preserved verbatim (build, route, demo,
    print-tree, init, ingest, replay); 1 new subcommand (stats, [context] BuildStats diagnostic report with human-readable output and CLI #106)
  • tests/test_cli.py exit-code-0 assertion relaxed to accept Typer's
    code-2 no-args convention; all other golden assertions unchanged

#106 — BuildStats diagnostic report

  • BuildStats.prompt_tokens @Property (single source of truth; replaces
    sum(tokens_per_section.values()) + header_footer_tokens across 6+
    inline call sites in extras/otel.py, main.py, metrics.py)
  • BuildStats.report(format='text'|'rich', *, phase, budget) returns
    deterministic, paste-friendly diagnostic string with sections,
    drop-reasons, and budget-utilisation recommendations
  • BuildStats.report_dict(...) returns versioned ({"version": 1, ...})
    structured payload for programmatic consumers
  • contextweaver stats CLI subcommand renders against ingested session
    JSON; --format {rich,text}, --phase, --budget flags

#226 — RouteResult.explanation()

  • New routing/explanation.py module keeps router.py under soft cap
  • RouteResult.explanation(format='md'|'dict') overload pair
  • Markdown: top-k table, confidence-gap line, ambiguity flag +
    clarifying question, context-hints / filters sections
  • Dict: versioned schema; safe for OTel span attributes
  • Privacy: never emits args_schema or full descriptions
  • docs/troubleshooting.md gains paste-ready example

#225 — JSON Schemas + drift gate (closes #196)

  • 6 schemas under schemas/ and docs/schemas/v0/ (mkdocs-published $id
    URLs): catalog, choice_card, result_envelope, route_trace,
    build_stats, graph_manifest
  • src/contextweaver/_schema_gen.py — stdlib-only dataclass → Draft
    2020-12 generator; deterministic byte-stable output
  • ChoiceCard.kind tightened to Literal[...]; post_init enforces
    gateway-spec §2 size bounds (name ≤64, ≤5 tags each ≤24 chars)
    on every code path including from_dict
  • make schemas / make schemas-check (gating in make ci); CI workflow
    runs --check after the scorecard drift gate
  • new docs/contracts.md; examples/sample_catalog.yaml gets the

    yaml-language-server: $schema= header

#224 — OTel GenAI semantic conventions

  • extras/otel.py rewritten on top of opentelemetry.semconv._incubating
    .attributes.gen_ai_attributes (opentelemetry-api>=1.27 floor +
    opentelemetry-semantic-conventions>=0.48b0)
  • Span shapes: invoke_agent for build(), execute_tool for route()
  • Stable attrs: gen_ai.system='contextweaver', gen_ai.operation.name,
    gen_ai.usage.input_tokens, gen_ai.tool.name
  • Engine-specific telemetry under contextweaver.* namespace
  • Token-usage histogram renamed to canonical gen_ai.client.token.usage
  • otel_emit_experimental flag (default False) gates PII-prone attrs
  • tests/test_otel.py uses InMemorySpanExporter for deterministic
    SemConv-name assertions
  • new docs/integration_otel.md (Laminar + Phoenix worked examples,
    PII-safety guidance)

Verification (all green on v0.5 branch):
ruff format --check src/ tests/ examples/ scripts/ → clean
ruff check src/ tests/ examples/ scripts/ → clean
mypy src/ → 0 issues / 66 files
pytest --cov=contextweaver -q → 995 passed, 2 skipped
(+41 new tests over baseline)
python scripts/gen_schemas.py --check → schemas up to date
python -m contextweaver demo → completes
make example → all examples clean

… surfaces (#106, #221, #224, #225, #226)

Lands the 5-issue "decision-surface inspectability" group selected by the
triage pass.  Shared blast radius: envelope.py, __main__.py,
routing/router.py + new explanation.py, extras/otel.py, new schemas/
directory.  Owner-mode (Mode B) authorised; bumps version to 0.5.0 with
documented public-API deltas under CHANGELOG ## [Unreleased].

#221 — argparse → Typer + Rich rewrite of __main__.py
  - typer>=0.9 + rich>=13.0 promoted from [cli] extra to core deps
  - [cli] extra kept as empty alias for one cycle (removal: v0.6)
  - 7 existing subcommands preserved verbatim (build, route, demo,
    print-tree, init, ingest, replay); 1 new subcommand (stats, #106)
  - tests/test_cli.py exit-code-0 assertion relaxed to accept Typer's
    code-2 no-args convention; all other golden assertions unchanged

#106 — BuildStats diagnostic report
  - BuildStats.prompt_tokens @Property (single source of truth; replaces
    sum(tokens_per_section.values()) + header_footer_tokens across 6+
    inline call sites in extras/otel.py, __main__.py, metrics.py)
  - BuildStats.report(format='text'|'rich', *, phase, budget) returns
    deterministic, paste-friendly diagnostic string with sections,
    drop-reasons, and budget-utilisation recommendations
  - BuildStats.report_dict(...) returns versioned ({"version": 1, ...})
    structured payload for programmatic consumers
  - contextweaver stats CLI subcommand renders against ingested session
    JSON; --format {rich,text}, --phase, --budget flags

#226 — RouteResult.explanation()
  - New routing/explanation.py module keeps router.py under soft cap
  - RouteResult.explanation(format='md'|'dict') overload pair
  - Markdown: top-k table, confidence-gap line, ambiguity flag +
    clarifying question, context-hints / filters sections
  - Dict: versioned schema; safe for OTel span attributes
  - Privacy: never emits args_schema or full descriptions
  - docs/troubleshooting.md gains paste-ready example

#225 — JSON Schemas + drift gate (closes #196)
  - 6 schemas under schemas/ and docs/schemas/v0/ (mkdocs-published $id
    URLs): catalog, choice_card, result_envelope, route_trace,
    build_stats, graph_manifest
  - src/contextweaver/_schema_gen.py — stdlib-only dataclass → Draft
    2020-12 generator; deterministic byte-stable output
  - ChoiceCard.kind tightened to Literal[...]; __post_init__ enforces
    gateway-spec §2 size bounds (name ≤64, ≤5 tags each ≤24 chars)
    on every code path including from_dict
  - make schemas / make schemas-check (gating in make ci); CI workflow
    runs --check after the scorecard drift gate
  - new docs/contracts.md; examples/sample_catalog.yaml gets the
    # yaml-language-server: $schema= header

#224 — OTel GenAI semantic conventions
  - extras/otel.py rewritten on top of opentelemetry.semconv._incubating
    .attributes.gen_ai_attributes (opentelemetry-api>=1.27 floor +
    opentelemetry-semantic-conventions>=0.48b0)
  - Span shapes: invoke_agent for build(), execute_tool for route()
  - Stable attrs: gen_ai.system='contextweaver', gen_ai.operation.name,
    gen_ai.usage.input_tokens, gen_ai.tool.name
  - Engine-specific telemetry under contextweaver.* namespace
  - Token-usage histogram renamed to canonical gen_ai.client.token.usage
  - otel_emit_experimental flag (default False) gates PII-prone attrs
  - tests/test_otel.py uses InMemorySpanExporter for deterministic
    SemConv-name assertions
  - new docs/integration_otel.md (Laminar + Phoenix worked examples,
    PII-safety guidance)

Verification (all green on v0.5 branch):
  ruff format --check src/ tests/ examples/ scripts/   → clean
  ruff check src/ tests/ examples/ scripts/            → clean
  mypy src/                                            → 0 issues / 66 files
  pytest --cov=contextweaver -q                        → 995 passed, 2 skipped
                                                         (+41 new tests over baseline)
  python scripts/gen_schemas.py --check                → schemas up to date
  python -m contextweaver demo                         → completes
  make example                                         → all examples clean
Copilot AI review requested due to automatic review settings May 16, 2026 16:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Lands the five "decision-surface inspectability" issues (#106, #221, #224, #225, #226) in a single drop: BuildStats diagnostic reports + stats CLI, RouteResult.explanation(), JSON-Schema publishing with a CI drift gate, an argparse→Typer/Rich CLI rewrite, and an OTel rewrite to the GenAI semantic conventions. Bumps to 0.5.0. Promotes typer/rich from [cli] extra to core deps (and keeps [cli] as an empty alias for one cycle). The OTel and CLI changes are user-visible breaking changes documented under CHANGELOG [Unreleased].

Changes:

  • BuildStats gains prompt_tokens property + report()/report_dict() with a new contextweaver stats Typer subcommand and RouteResult.explanation() renders Markdown/dict rationale (extracted into routing/explanation.py).
  • Six JSON Schemas committed under schemas/ and docs/schemas/v0/ generated by the new stdlib _schema_gen.py; make schemas/schemas-check wired into make ci and the GitHub Actions workflow; ChoiceCard.kind tightened to Literal[...] with __post_init__ size-bound enforcement.
  • extras/otel.py rewritten on opentelemetry.semconv._incubating.attributes.gen_ai_attributes (floor bumped to >=1.27 + new opentelemetry-semantic-conventions>=0.48b0); spans renamed to invoke_agent/execute_tool; token-usage histogram renamed to gen_ai.client.token.usage; __main__.py rewritten on Typer + Rich.

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/contextweaver/envelope.py Adds BuildStats.prompt_tokens, report()/report_dict(), and ChoiceCard size-bound enforcement + Literal kind.
src/contextweaver/_schema_gen.py New stdlib dataclass→JSON-Schema generator with per-type extras and deterministic serialisation.
src/contextweaver/routing/router.py Adds RouteResult.explanation() with overload pair, delegating to routing.explanation.
src/contextweaver/routing/explanation.py New module rendering routing rationale as Markdown or versioned dict.
src/contextweaver/extras/otel.py Rewrite onto GenAI SemConv; invoke_agent/execute_tool spans; engine-specific attrs under contextweaver.* namespace; otel_emit_experimental gate.
src/contextweaver/__main__.py Argparse→Typer+Rich rewrite, adds stats subcommand, factored _restore_manager_from_session helper.
src/contextweaver/metrics.py Switches to BuildStats.prompt_tokens.
scripts/gen_schemas.py New regenerator/drift-checker for the six published schemas.
schemas/*.schema.json, docs/schemas/v0/*.schema.json Six committed schemas mirrored under docs for $id publishing.
pyproject.toml Version 0.5.0; typer/rich into core; [cli] emptied; OTel floor bumped; mypy carve-out for __main__.
Makefile, .github/workflows/ci.yml Add schemas/schemas-check targets and wire the drift gate into CI.
mkdocs.yml Adds Contracts + Observability nav and excludes schema JSONs from docs nav.
examples/sample_catalog.yaml Adds # yaml-language-server: $schema=... header.
docs/contracts.md, docs/integration_otel.md, docs/troubleshooting.md New contracts page, OTel guide with Laminar/Phoenix examples, troubleshooting addition for explanation().
AGENTS.md, CHANGELOG.md Updated module map + comprehensive [Unreleased] entries.
tests/test_envelope.py, tests/test_router.py, tests/test_otel.py, tests/test_cli.py, tests/test_schema_gen.py New + updated tests covering report, explanation, OTel SemConv assertions, stats CLI, and schema round-trips/drift.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 16, 2026

Benchmark delta (vs main)

Soft regression feedback only — this comment never blocks the PR.
Latency budget: ⚠️ when head > base × 1.3. Accuracy budget: ⚠️ when head < base - 1pp.

Routing summary (single backend × catalog sizes)

size recall@k (head Δ vs base) MRR (head Δ vs base) p99 (ms)
50 ✅ 0.5649 (+0.0000) ✅ 0.4978 (+0.0000) ✅ 0.442 (base 0.463)
83 ✅ 0.3825 (+0.0000) ✅ 0.3242 (+0.0000) ✅ 0.720 (base 0.876)
1000 ✅ 0.1475 (+0.0000) ✅ 0.1456 (+0.0000) ✅ 33.863 (base 31.897)

Per-backend × per-size matrix

backend size recall@k (Δ) MRR (Δ) p99 (ms)
bm25 100 ✅ 0.3825 (+0.0000) ✅ 0.3399 (+0.0000) ✅ 5.939 (base 5.642)
bm25 500 ✅ 0.2250 (+0.0000) ✅ 0.2165 (+0.0000) ✅ 28.543 (base 27.538)
bm25 1000 ✅ 0.1575 (+0.0000) ✅ 0.1525 (+0.0000) ✅ 85.467 (base 78.368)
fuzzy 100 ✅ 0.0000 (+0.0000) ✅ 0.0000 (+0.0000) ✅ 0.000 (base 0.000)
fuzzy 500 ✅ 0.0000 (+0.0000) ✅ 0.0000 (+0.0000) ✅ 0.000 (base 0.000)
fuzzy 1000 ✅ 0.0000 (+0.0000) ✅ 0.0000 (+0.0000) ✅ 0.000 (base 0.000)
tfidf 100 ✅ 0.3825 (+0.0000) ✅ 0.3220 (+0.0000) ✅ 0.983 (base 0.872)
tfidf 500 ✅ 0.2325 (+0.0000) ✅ 0.2314 (+0.0000) ✅ 9.396 (base 8.660)
tfidf 1000 ✅ 0.1475 (+0.0000) ✅ 0.1456 (+0.0000) ✅ 33.875 (base 30.071)

Context pipeline (per scenario)

scenario tokens dropped dedup
large_catalog 1514 (base 1514, Δ+0) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
long_conversation 2548 (base 2548, Δ+0) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
short_conversation 496 (base 496, Δ+0) 0 (base 0, Δ+0) 0 (base 0, Δ+0)
stress_conversation 6651 (base 6651, Δ+0) 7 (base 7, Δ+0) 4 (base 4, Δ+0)

Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.

claude and others added 3 commits May 16, 2026 22:00
…umented enum

Addresses Phase 2 audit finding on PR #231 — `ChoiceCard.kind` was
tightened to `Literal["tool", "agent", "skill", "internal"]` so the
published `choice_card.schema.json` carries an enum constraint, but
`__post_init__` only enforced name/tags size bounds. The `Literal[...]`
annotation only constrains mypy: a Python caller (or `ChoiceCard.from_dict`
reading an external JSON payload) could still construct
`ChoiceCard(kind="bogus")` and produce an object that violates the
published schema — a contract leak between the type system and runtime.

This commit adds a runtime check in `__post_init__` that rejects any
value not in `CHOICE_CARD_KINDS`, mirroring the existing size-bound
enforcement. The check fires on every construction path including
`ChoiceCard.from_dict`, matching the PR description's claim of
"enforce ... on every code path including from_dict".

Tests:
- test_choice_card_rejects_unknown_kind — direct ChoiceCard(kind="bogus")
- test_choice_card_from_dict_rejects_unknown_kind — via from_dict()
- test_choice_card_accepts_all_documented_kinds — pins the enum

Verification:
- ruff format/lint/mypy clean
- pytest -q: 995 passed, 5 skipped
- make schemas-check / scorecard-check / llms-check all clean
- make example + make demo clean
- otel.py: gate PII-prone prompt content behind otel_emit_experimental flag
  (was dead code — stored but never checked in on_context_built)
- otel.py: add version-guard comment for incubating SemConv import path
- __main__.py: add event-index context to _restore_manager_from_session errors
- _schema_gen.py: document 300-line soft cap exemption in module docstring
- test_otel.py: add test_experimental_flag_gates_prompt_in_span verifying
  the flag conditionally includes/excludes prompt in span attributes
@dgenio dgenio merged commit 95d2c88 into main May 17, 2026
4 checks passed
@dgenio dgenio deleted the claude/triage-issues-DqkjR branch May 17, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[contracts] Publish JSON Schemas for catalog files, ChoiceCard, ResultEnvelope, RouteTrace

3 participants