Skip to content

LIFT-2099: Add relevance scoring to wildcard search queries#214

Merged
wes-maszk-rp merged 11 commits into
release-0.3.6.xfrom
LIFT-2099/search-relevance-0.3.6.x
Apr 6, 2026
Merged

LIFT-2099: Add relevance scoring to wildcard search queries#214
wes-maszk-rp merged 11 commits into
release-0.3.6.xfrom
LIFT-2099/search-relevance-0.3.6.x

Conversation

@wes-maszk-rp

Copy link
Copy Markdown

Summary

  • Adds MatchQuery.java - new Elasticsearch match query DSL class
  • Modifies withWildCardPopulater() to generate hybrid queries with both wildcard (filter) and match (should)
  • Updates BoolQuery to handle MatchQuery serialization

Problem

Multi-word searches like "pb promo" returned results in arbitrary order because wildcard queries don't contribute to Elasticsearch scoring - they're binary (match/no-match) and produce constant scores.

Solution: Hybrid Query Approach

Before (wildcard only):

{
  "bool": {
    "should": [
      {"wildcard": {"keywords": "*pb*"}},
      {"wildcard": {"keywords": "*promo*"}}
    ]
  }
}

After (hybrid filter + should):

{
  "bool": {
    "filter": [
      {"wildcard": {"keywords": "*pb*"}},
      {"wildcard": {"keywords": "*promo*"}}
    ],
    "should": [
      {"match": {"keywords": "pb"}},
      {"match": {"keywords": "promo"}}
    ]
  }
}

How It Works

Clause Query Type Purpose Scoring
filter Wildcard Ensures substring exists None (binary)
should Match Boosts exact token matches TF-IDF/BM25

Why This Approach

  1. Backwards Compatible - same documents match
  2. Standard Elasticsearch Pattern - documented approach
  3. No Reindexing - works with existing keywords field

Test plan

  • CI passes
  • Deploy to dev and test Portal search with multi-word queries
  • Confirm "pb promo" returns exact matches ranked higher

Implement hybrid search approach for w(), sw(), ew() operators that adds
match queries alongside wildcards. Wildcards in filter ensure substring
matching (preserving existing behavior), while match queries in should
add TF-IDF relevance scoring so exact token matches rank higher.
Update test assertions to expect the new query structure where:
- filter: contains Wildcard queries for substring matching
- should: contains MatchQuery for relevance scoring

This enables proper result ranking for multi-word searches.
…score sorting

- BoolQuery.divvyElasticList: Hoist should/must/must_not/filter clauses from
  nested BoolQuery objects instead of placing entire BoolQuery in filter context
  (filter context disables scoring, breaking relevance ranking)

- ElasticRql: Add _score desc as secondary sort when search terms are present,
  so within items with same primary sort value, more relevant results rank higher

- Bump version to 0.3.6.18
…ing should

Fix overly aggressive hoisting that was flattening all nested BoolQuery
structures. Now only hoists clauses from BoolQuery objects that have
MatchQuery in their should list (relevance scoring queries from w/sw/ew).
Regular nested BoolQuery structures are preserved in filter context.
@mason-chester-rp

Copy link
Copy Markdown

Multi-value wildcard queries: OR → AND semantic regression

For queries like sw(city,Chand,Atl), the loop in withWildCardPopulater adds each value's wildcard to filter and each match query to should:

{
  "bool": {
    "filter": [{"wildcard": {"city": "Chand*"}}, {"wildcard": {"city": "Atl*"}}],
    "should": [{"match": {"city": "Chand"}}, {"match": {"city": "Atl"}}]
  }
}

When filter clauses are present, minimum_should_match defaults to 0 — should is scoring-only. Matching is determined entirely by filter, and all filter clauses are ANDed. A city can't start with both "Chand" and "Atl", so this returns zero results.

The old behavior put wildcards in should with no filter, giving minimum_should_match=1 (OR semantics).

Fix: each wildcard+match pair needs its own nested BoolQuery, with those placed in an outer should:

{
  "bool": {
    "should": [
      {"bool": {"filter": [{"wildcard": {"city": "Chand*"}}], "should": [{"match": {"city": "Chand"}}]}},
      {"bool": {"filter": [{"wildcard": {"city": "Atl*"}}], "should": [{"match": {"city": "Atl"}}]}}
    ]
  }
}

This preserves OR semantics while still adding relevance scoring per term.

Comment thread build.gradle
Multi-value queries like sw(city,Chand,Atl) were ANDing wildcards in filter,
returning zero results since a city can't start with both values simultaneously.

Fix: for multi-value, wrap wildcards in an inner should BoolQuery so
minimum_should_match=1 gives OR semantics. Single-value behavior unchanged.

For compound queries like sw(city,Chand,Atl)&deleted=false the inner
wildcard BoolQuery is hoisted into the outer filter alongside other clauses,
preserving correct OR semantics even when combined with filter clauses.
@mason-chester-rp

Copy link
Copy Markdown

Version mismatch: build.gradle was bumped to 0.3.6.17 but the changelog entry says 0.3.6.19.

@wes-maszk-rp wes-maszk-rp merged commit 959a272 into release-0.3.6.x Apr 6, 2026
1 of 2 checks passed
wes-maszk-rp added a commit that referenced this pull request Apr 29, 2026
0.3.6.19 was already published to the package registry from PR #214's
earlier version-bump commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants