Skip to content

explore: rerank misses canonical files for basename-stem and case variants #448

@justrach

Description

@justrach

Problem

searchContent is case-insensitive, but two rerank signals are narrower than the matches they are scoring:

  1. Basename matching is one-way: asciiContainsIgnoreCase(stem, query) only boosts when the file stem contains the full query. Query Explorer does not boost src/explore.zig because explore does not contain Explorer, even though the query clearly contains the stem.
  2. Symbol-definition matching is case-sensitive: std.mem.eql(u8, sym.name, query). Query store matches pub const Store = struct {} in content search, but the definition line gets no +5 symbol boost.

This is adjacent to #447: once large files like src/explore.zig reach the candidate pool, this still leaves canonical files weaker than incidental mentions.

Repro 1: basename stem relation is backwards

test "rerank boosts basename when query contains stem" {
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    var explorer = Explorer.init(arena.allocator());

    try explorer.indexFile("src/aaa.zig", "// Explorer is mentioned here\n");
    try explorer.indexFile("src/explore.zig", "// Explorer is mentioned here\n");

    const results = try explorer.searchContent("Explorer", testing.allocator, 10);
    try testing.expectEqualStrings("src/explore.zig", results[0].path);
}

Current result: src/aaa.zig ranks first by path tie-break.

Repro 2: symbol definition boost is case-sensitive

test "rerank symbol definition boost is case-insensitive" {
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    var explorer = Explorer.init(arena.allocator());

    try explorer.indexFile("aaa.zig", "// store is mentioned here\n");
    try explorer.indexFile("zzz.zig", "pub const Store = struct {};\n");

    const results = try explorer.searchContent("store", testing.allocator, 10);
    try testing.expectEqualStrings("zzz.zig", results[0].path);
}

Current result: aaa.zig ranks first by path tie-break.

Expected

  • Treat basename stem relation symmetrically for substring-style intent: stem contains query OR query contains stem (with exact stem match still strongest).
  • Use case-insensitive symbol-name equality for the symbol-definition boost, consistent with searchContent matching.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions