release: v0.2.5808 — codedb_bundle schema fix + rerank-trace logger (#434, #436, #437) by justrach · Pull Request #439 · justrach/codedb

justrach · 2026-05-06T15:33:32Z

Summary

Bundles three PRs into the v0.2.5808 release:

#435 (Stage 1 of #434) — require arguments on bundle ops items so schema-greedy LLMs are forced to populate the wrapper.
#438 (Stage 2 of #437) — discriminated oneOf per sub-tool: each branch pins tool to a const and binds arguments to that sub-tool's actual inputSchema. Once a model picks tool: "codedb_outline" the only matching branch requires arguments.path:string — no schema-minimal escape.
#436 — opt-in JSONL rerank-trace logger for offline tuning experiments, gated by rerank_trace = true in .codedbrc. Plus a tiny fix to make rerankAndFinalize always score even at len 1 (so single-result entries don't log score=0.0).

#435, #436, and #438 are superseded by this PR and should close on merge.

Validation

513/513 tests pass on the merged branch (zig build test).
End-to-end Sonnet 4.6 test against the new bundle schema: prior bug (empty arguments payloads under no fix; wrong-keyname payloads under Stage 1 only) does not reproduce. Same task that previously emitted codedb_word with {"query": "..."} (failing) now emits {"word": "..."} (succeeding) — the discriminated branch's required: ["word"] constraint flows through to model output.
Bundle schema payload size doubled (~12KB → ~24KB) due to inlining 19 sub-tool schemas as oneOf branches. Acceptable cost for the constraint.

Branch contents

d6ad9ca  release: v0.2.5808 — codedb_bundle schema fix + rerank-trace logger
d83c78a  Merge PR #436: rerank-trace logger for offline tuning
a757b7f  Merge PR #438 (incl. #435): codedb_bundle schema fix, Stage 1 + Stage 2
8c85c24  fix(mcp): discriminated oneOf on codedb_bundle ops items (#437)
15907ae  test(mcp): issue-437 failing test for bundle items oneOf
7fb1e87  fix(mcp): require arguments in codedb_bundle ops items schema (#434)
d470e5b  test(mcp): issue-434 failing test for codedb_bundle ops schema
d4391db  fix(explore): always score in rerankAndFinalize, not just when len > 1
54b6b72  feat(explore): opt-in rerank-trace logger for offline tuning experiments

Test plan

zig build test — 513/513 pass.
Stdio MCP probe against the freshly-built binary confirms the served codedb_bundle schema includes oneOf with 19 branches, each pinning tool to a const and binding arguments to the sub-tool's real schema.
Sonnet 4.6 end-to-end bundle call: arguments populated correctly per sub-op.
Local build + notarize (arm64 + x86_64) before publishing the GitHub release.

🤖 Generated with Claude Code

Adds a v0 trace logger that appends one JSON line per searchContent invocation when enabled via .codedbrc (rerank_trace = true). Captures {ts, query, results:[{path,line,score}]} so we can inspect the data and decide whether online learning-to-rank from agent traces is worth building. Pure observation — does not change ranking behavior. Disabled by default. Caps query at 256 bytes, results at 50 entries, and rotates the file by truncate-clobber once it crosses 10 MB. All I/O errors are swallowed; logging never breaks a search. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pre-fix the multi-signal scoring loop only ran when result_list had more than one item — a micro-optimization that skipped sorting a single result. With the rerank-trace logger added in 54b6b72, this made single-result entries log score=0.0, making them indistinguishable from genuinely zero-confidence matches in offline analysis. The fix runs scoring unconditionally and keeps the sort guarded behind len > 1. Cost: a few µs per single-result search — negligible. Caught by end-to-end binary verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The bundle inputSchema advertises ops items with required: ["tool"] and arguments as a bare {type: "object"}, so function-calling LLMs emit {tool, arguments: {}} as the minimum-valid payload. This test asserts "arguments" is in items.required so models are forced to populate it. Also exposes tools_list as pub for the test to introspect. Fails on main; fix follows in next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stage 1 of the bundle-schema fix. Prior schema had ops items with required: ["tool"] and arguments as a bare {type: "object"}, so function-calling LLMs read {} as a valid arguments payload and emitted {tool, arguments: {}} as the minimum-valid call. The empty object then triggered the #424 inline-args fallback, which used the op object itself as the args bag and surfaced as "received keys: [tool, arguments]" from each sub-tool. Adding "arguments" to items.required forces the model to populate it. The runtime inline-args fallback at mcp.zig:1948 stays as a backstop for non-conformant clients. Stage 2 (discriminated oneOf over tool, binding arguments to each sub-tool's inputSchema) is a follow-up — it requires turning the hand-rolled tools_list literal into a builder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stage 2 of the bundle-schema fix. Stage 1 (#434) made `arguments` required at the items level, but the field is still a bare {type: "object"} so a schema-greedy model can satisfy `required` by emitting `arguments: {}`. This test asserts the items schema contains a discriminated `oneOf` with one branch per dispatchable codedb_* sub-tool, each pinning `tool` to a const and `arguments` to that sub-tool's actual inputSchema. Adds a stub `buildAugmentedToolsList` that returns the unaugmented schema so the test fails at runtime instead of as a compile error. The real builder lands in the fix commit. Fails on this branch; fix follows in next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stage 2 of the bundle-schema fix. Stage 1 (#434) made `arguments` required at the items level but left it as a bare {type: "object"}; a schema-greedy model could still satisfy the required check by emitting `arguments: {}`. Stage 2 binds the *contents* of arguments to each sub-tool's actual inputSchema via a discriminated oneOf on `tool` (const) → `arguments` (sub-tool inputSchema). Once a model picks `tool: "codedb_outline"`, the only matching branch requires arguments.path:string — there is no schema-minimal escape. `buildAugmentedToolsList` parses the existing tools_list literal once at server startup, mutates the bundle items to add the oneOf, and serializes back. No hand-maintained duplication — branches are generated from the per-sub-tool schemas already advertised. Falls back to the raw tools_list if augmentation fails (parse error / OOM) so clients still get a valid schema. codedb_bundle (recursive) and codedb_edit (write op) are excluded from the oneOf since handleBundle rejects them at runtime anyway. Schema payload roughly doubles (~12KB → ~24KB after augmentation, 19 branches across the dispatchable codedb_* tools). Test: tests.zig now asserts the augmented schema contains oneOf with branches that pin tool to a const and preserve each sub-tool's required args (codedb_outline branch must require `path`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… 2 (#434, #437)

…434, #436, #437) Bundles three PRs: - #435 (Stage 1, #434): require `arguments` on bundle ops items - #438 (Stage 2, #437): discriminated oneOf per sub-tool - #436: opt-in rerank-trace logger for offline tuning + score-on-len-1 fix End-to-end Sonnet 4.6 verifies the schema constraint flows through to model output: bundle calls now arrive with populated, correctly- named `arguments` for each sub-op. 513/513 tests pass on the merged branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-05-06T15:36:51Z

💡 Codex Review

codedb/src/explore.zig

Lines 1876 to 1878 in d6ad9ca

    
           if (locked) { 
        
               current_size = file.length(io_inst) catch current_size; 
        
               if (current_size >= size_limit) current_size = 0;

Truncate after the locked size check

When rerank_trace is enabled in more than one codedb process, the trace can cross 10 MB between this process's pre-lock length check and this locked recheck. In that case this line only resets the write offset to 0, so the next write overwrites the beginning while leaving the old tail in place; the file remains over the documented cap and the JSONL stream contains stale/partial records. The locked branch needs to truncate/recreate the file before writing at offset 0.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

justrach and others added 9 commits May 6, 2026 21:04

Merge PR #438 (incl. #435): codedb_bundle schema fix, Stage 1 + Stage…

a757b7f

… 2 (#434, #437)

Merge PR #436: rerank-trace logger for offline tuning

d83c78a

justrach merged commit 907ac96 into main May 6, 2026
1 check failed

This was referenced May 6, 2026

fix(mcp): require arguments in codedb_bundle ops schema (#434) #435

Closed

fix(mcp): discriminated oneOf on codedb_bundle ops items (#437) #438

Closed

feat(explore): opt-in rerank-trace logger for offline tuning #436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.2.5808 — codedb_bundle schema fix + rerank-trace logger (#434, #436, #437)#439

release: v0.2.5808 — codedb_bundle schema fix + rerank-trace logger (#434, #436, #437)#439
justrach merged 9 commits intomainfrom
release/0.2.5808

justrach commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 6, 2026

Summary

Validation

Branch contents

Test plan

Uh oh!

chatgpt-codex-connector Bot commented May 6, 2026

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant