hotfix: doc-language penalty in rerankSignalScore (v0.2.5807) by justrach · Pull Request #433 · justrach/codedb

justrach · 2026-05-06T05:31:52Z

Summary

Hotfix on top of v0.2.5807. Live-binary testing showed CHANGELOG.md and benchmark *.md files with 4-6 mentions of an identifier on one line outranking actual code call sites for queries like searchContent, handleCallers, pathHasSegment. The Tier 0 code-first ordering correctly retrieves source files, but the rerank's per-line frequency then re-promotes any markdown line with high mention density.

Same release tag v0.2.5807 — version not bumped per request. Assets will be re-uploaded with --clobber after merge.

Fix

In rerankSignalScore (src/explore.zig), cap doc-language scores at 1.0 then halve:

if (isDocLanguage(detectLanguage(r.path))) {
    score = @min(score, 1.0) * 0.5;
}

For doc files, more mentions don't reflect more code-relevance — they reflect prose density. Cap+halve ensures any code hit (score >= 1) outranks any markdown / json / yaml / unknown-language hit. Symmetric with the existing path-prior penalty for tests/, examples/, vendor/, node_modules/, third_party/.

Test

test "issue-429-e: searchContent rerank penalises doc-language files so code beats markdown noise" — a 4-mention markdown line vs a 1-mention code call site. Pre-fix the markdown line ranks #1; post-fix the code call site ranks #1.

zig build test exit 0 (461 tests).

Test plan

zig build test
Reviewer spot-check: src/explore.zig rerankSignalScore — confirm cap-and-halve uses @min not multiplication
Re-run live binary on this repo: searchContent, handleCallers, pathHasSegment, BenchContext — markdown files should rank below code

🤖 Generated with Claude Code

…tfix) Live-binary testing of v0.2.5807 showed CHANGELOG.md and benchmark *.md files with 4-6 mentions of an identifier on one line outranking actual code call sites under per-line frequency scoring. The Tier 0 code-first ordering retrieves source files, but the rerank's per-line frequency then re-promotes the high-density doc lines. Cap doc-language scores at 1.0 then halve, so any code hit (score >= 1) outranks any markdown / json / yaml / unknown-language hit. Symmetric with the existing path-prior penalty for tests/examples/vendor. Adds issue-429-e regression test demonstrating a 4-mention markdown line no longer outranks a 1-mention code call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-06T05:34:19Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	535675	548449	+2.38%	+12774	OK
`codedb_changes`	53178	56284	+5.84%	+3106	OK
`codedb_deps`	10094	9742	-3.49%	-352	OK
`codedb_edit`	6141	6133	-0.13%	-8	OK
`codedb_find`	62896	61432	-2.33%	-1464	OK
`codedb_hot`	99796	98929	-0.87%	-867	OK
`codedb_outline`	302227	304358	+0.71%	+2131	OK
`codedb_read`	94098	97226	+3.32%	+3128	OK
`codedb_search`	198876	209369	+5.28%	+10493	OK
`codedb_snapshot`	281552	283168	+0.57%	+1616	OK
`codedb_status`	213253	215769	+1.18%	+2516	OK
`codedb_symbol`	61312	60642	-1.09%	-670	OK
`codedb_tree`	81939	80492	-1.77%	-1447	OK
`codedb_word`	70815	71297	+0.68%	+482	OK

justrach merged commit f39d144 into main May 6, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hotfix: doc-language penalty in rerankSignalScore (v0.2.5807)#433

hotfix: doc-language penalty in rerankSignalScore (v0.2.5807)#433
justrach merged 1 commit intomainfrom
hotfix-0.2.5807-doc-penalty

justrach commented May 6, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 6, 2026

Summary

Fix

Test

Test plan

Uh oh!

Uh oh!

github-actions Bot commented May 6, 2026

Benchmark Regression Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant