LIFT-2099: Add relevance scoring to wildcard search queries#214
Conversation
Implement hybrid search approach for w(), sw(), ew() operators that adds match queries alongside wildcards. Wildcards in filter ensure substring matching (preserving existing behavior), while match queries in should add TF-IDF relevance scoring so exact token matches rank higher.
Update test assertions to expect the new query structure where: - filter: contains Wildcard queries for substring matching - should: contains MatchQuery for relevance scoring This enables proper result ranking for multi-word searches.
…score sorting - BoolQuery.divvyElasticList: Hoist should/must/must_not/filter clauses from nested BoolQuery objects instead of placing entire BoolQuery in filter context (filter context disables scoring, breaking relevance ranking) - ElasticRql: Add _score desc as secondary sort when search terms are present, so within items with same primary sort value, more relevant results rank higher - Bump version to 0.3.6.18
…ing should Fix overly aggressive hoisting that was flattening all nested BoolQuery structures. Now only hoists clauses from BoolQuery objects that have MatchQuery in their should list (relevance scoring queries from w/sw/ew). Regular nested BoolQuery structures are preserved in filter context.
Multi-value wildcard queries: OR → AND semantic regressionFor queries like {
"bool": {
"filter": [{"wildcard": {"city": "Chand*"}}, {"wildcard": {"city": "Atl*"}}],
"should": [{"match": {"city": "Chand"}}, {"match": {"city": "Atl"}}]
}
}When The old behavior put wildcards in Fix: each wildcard+match pair needs its own nested {
"bool": {
"should": [
{"bool": {"filter": [{"wildcard": {"city": "Chand*"}}], "should": [{"match": {"city": "Chand"}}]}},
{"bool": {"filter": [{"wildcard": {"city": "Atl*"}}], "should": [{"match": {"city": "Atl"}}]}}
]
}
}This preserves OR semantics while still adding relevance scoring per term. |
Multi-value queries like sw(city,Chand,Atl) were ANDing wildcards in filter, returning zero results since a city can't start with both values simultaneously. Fix: for multi-value, wrap wildcards in an inner should BoolQuery so minimum_should_match=1 gives OR semantics. Single-value behavior unchanged. For compound queries like sw(city,Chand,Atl)&deleted=false the inner wildcard BoolQuery is hoisted into the outer filter alongside other clauses, preserving correct OR semantics even when combined with filter clauses.
|
Version mismatch: |
0.3.6.19 was already published to the package registry from PR #214's earlier version-bump commit.
Summary
MatchQuery.java- new Elasticsearch match query DSL classwithWildCardPopulater()to generate hybrid queries with both wildcard (filter) and match (should)Problem
Multi-word searches like "pb promo" returned results in arbitrary order because wildcard queries don't contribute to Elasticsearch scoring - they're binary (match/no-match) and produce constant scores.
Solution: Hybrid Query Approach
Before (wildcard only):
{ "bool": { "should": [ {"wildcard": {"keywords": "*pb*"}}, {"wildcard": {"keywords": "*promo*"}} ] } }After (hybrid filter + should):
{ "bool": { "filter": [ {"wildcard": {"keywords": "*pb*"}}, {"wildcard": {"keywords": "*promo*"}} ], "should": [ {"match": {"keywords": "pb"}}, {"match": {"keywords": "promo"}} ] } }How It Works
filtershouldWhy This Approach
keywordsfieldTest plan