LIFT-2099: Rework wildcard relevance scoring using rescore API#218
Merged
Conversation
The previous approach (MatchQuery in should + automatic _score desc sort) forced every shard to score every filter-matched document, defeating early-termination on the primary sort and causing timeouts on large indices. It also silently demoted sort=id because _score was injected ahead of the caller's sort. Revert w/sw/ew to plain wildcards in should. Add a new rank(field, "query text" [, windowSize]) RQL function that attaches a top-level Elasticsearch rescore block. Rescore runs BM25 scoring only against the top window_size docs each shard returns from the primary sort, so cost is bounded and callers that don't opt in pay nothing. No behavior change for any existing caller unless they add rank(...).
- Restore the `_score` guard in Order.java. A caller that opts into score-based ordering via sort=_score,desc must produce valid ES sort JSON; ES rejects `missing` on _score. - Reject nested rank() with a clear error message. rank() attaches a top-level rescore block and cannot be nested inside and()/or(). - Document the intentional pure-rerank weighting in Rescore.java (query_weight=0, rescore_query_weight=1). - Fix stale javadoc on withWildCardPopulater to include WITHOUT. - Add tests for the two correctness fixes.
0.3.6.19 was already published to the package registry from PR #214's earlier version-bump commit.
- Remove MatchQuery class and its serialization branch in BoolQuery. Nothing constructs a MatchQuery anymore; the rescore block builds its match clause directly in Rescore.toMap(). (connor-brown-rp) - Update changelog header to match build.gradle (0.3.6.19) and set the release date to today.
connor-brown-rp
approved these changes
May 4, 2026
mason-chester-rp
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The previous approach (MatchQuery in should + automatic _score desc sort) forced every shard to score every filter-matched document, defeating early-termination on the primary sort and causing timeouts on large indices. It also silently demoted sort=id because _score was injected ahead of the caller's sort.
Revert w/sw/ew to plain wildcards in should. Add a new rank(field, "query text" [, windowSize]) RQL function that attaches a top-level Elasticsearch rescore block. Rescore runs BM25 scoring only against the top window_size docs each shard returns from the primary sort, so cost is bounded and callers that don't opt in pay nothing.
No behavior change for any existing caller unless they add rank(...).