Skip to content

test: add MLT coverage tests for edge cases#114

Merged
poyrazK merged 2 commits into
mainfrom
feature/autocomplete-suggest
Jun 10, 2026
Merged

test: add MLT coverage tests for edge cases#114
poyrazK merged 2 commits into
mainfrom
feature/autocomplete-suggest

Conversation

@poyrazK

@poyrazK poyrazK commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add 5 new MLT coverage tests to crates/cloudsearch-index/tests/coverage.rs:
    • mlt_with_doc_id_not_found_returns_error: nonexistent doc_id returns empty results
    • mlt_with_like_and_empty_fields_auto_infers_from_like_json: empty fields list auto-inferred from like JSON keys
    • mlt_with_min_word_length_filters_short_terms: min_word_length filters short tokens
    • mlt_with_max_word_length_filters_long_terms: max_word_length filters long tokens
    • mlt_all_terms_filtered_returns_empty_or_error: all terms filtered by min_term_freq returns empty
  • All 9 MLT tests in coverage.rs now pass

Summary by CodeRabbit

Tests

  • Added five new integration tests for More-Like-This (MLT) functionality, covering: non-existent doc IDs, field auto-inference, term filtering by word length constraints, and scenarios where all reference terms are filtered out.

- mlt_with_doc_id_not_found_returns_error: nonexistent doc_id returns empty
- mlt_with_like_and_empty_fields_auto_infers_from_like_json: empty fields
  list auto-inferred from like JSON keys, terms extracted from like content
- mlt_with_min_word_length_filters_short_terms: min_word_length=4 filters
  single-char tokens
- mlt_with_max_word_length_filters_long_terms: max_word_length=4 filters
  tokens longer than threshold
- mlt_all_terms_filtered_returns_empty_or_error: all terms filtered by
  min_term_freq returns empty results

All 9 MLT tests in coverage.rs now pass (previously 4, now 9).
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@poyrazK, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 13 minutes and 23 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7da69239-331a-4c25-8e9a-4eeaa16849ec

📥 Commits

Reviewing files that changed from the base of the PR and between 0856717 and 29dd40a.

📒 Files selected for processing (1)
  • rust/crates/cloudsearch-index/tests/coverage.rs
📝 Walkthrough

Walkthrough

Five new integration tests extend MLT (More-Like-This) query coverage in the cloudsearch-index test suite. Tests cover non-existent document handling, field auto-inference, word-length term filtering, and edge cases where filtering eliminates all reference terms.

Changes

MLT Query Test Coverage

Layer / File(s) Summary
MLT Query Result Handling and Field Inference
rust/crates/cloudsearch-index/tests/coverage.rs
Tests verify that MLT queries handle non-existent document IDs gracefully by returning empty results, and auto-infer search fields from the provided like JSON when the fields parameter is empty.
MLT Term Filtering and Boundary Conditions
rust/crates/cloudsearch-index/tests/coverage.rs
Tests validate that MLT queries correctly filter reference terms using min_word_length and max_word_length constraints, still matching when longer or shorter tokens remain; and handle the edge case where all reference terms are filtered out via min_term_freq, returning zero hits.

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit searched for "docs that match,"
When fields went missing—watch the catch!
With min and max word lengths in place,
And filtering so full of grace,
Five tests now guard the MLT race! 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding MLT (More-Like-This) coverage tests for edge cases, which aligns with the changeset that adds five new integration tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/autocomplete-suggest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
rust/crates/cloudsearch-index/tests/coverage.rs (2)

973-973: ⚡ Quick win

Comment inaccuracy: reference term should be "rustacean", not "rust".

The comment states Reference terms: "rust" (from title) but the like JSON on line 979 specifies "title": "rustacean", not "rust". The tokenizer will extract "rustacean" as the reference term from the title field. The test still passes because "systems" matches doc1's body, but the comment is misleading.

📝 Suggested fix
-    // Reference terms: "rust" (from title) and "systems" (from body)
+    // Reference terms: "rustacean" (from title) and "systems" (from body)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/cloudsearch-index/tests/coverage.rs` at line 973, Update the
misleading comment that currently says 'Reference terms: "rust" (from title) and
"systems" (from body)' to reflect the actual title value "rustacean" (i.e.,
change "rust" to "rustacean"); locate the comment near the test that contains
the JSON `title`/`body` ("like" JSON) in the coverage test and modify the
reference-term text so it correctly lists "rustacean" and "systems".

888-888: ⚡ Quick win

Test name is misleading.

The test name says returns_error but the test checks for empty results (total == 0), not an error. Based on context snippet 1 (lib.rs:887-907), when build_mlt_bool_query returns an error for a missing doc_id, Index::search returns a SearchResponse with total=0 and empty hits rather than propagating the error. Consider renaming to mlt_with_doc_id_not_found_returns_empty_results for clarity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/cloudsearch-index/tests/coverage.rs` at line 888, Rename the test
function mlt_with_doc_id_not_found_returns_error to reflect its actual assertion
(it expects empty results), e.g.,
mlt_with_doc_id_not_found_returns_empty_results; update the async test function
name and any references to it so it matches the new descriptive name and makes
clear it asserts Index::search returns a SearchResponse with total == 0 and
empty hits when build_mlt_bool_query can't find the doc_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@rust/crates/cloudsearch-index/tests/coverage.rs`:
- Line 973: Update the misleading comment that currently says 'Reference terms:
"rust" (from title) and "systems" (from body)' to reflect the actual title value
"rustacean" (i.e., change "rust" to "rustacean"); locate the comment near the
test that contains the JSON `title`/`body` ("like" JSON) in the coverage test
and modify the reference-term text so it correctly lists "rustacean" and
"systems".
- Line 888: Rename the test function mlt_with_doc_id_not_found_returns_error to
reflect its actual assertion (it expects empty results), e.g.,
mlt_with_doc_id_not_found_returns_empty_results; update the async test function
name and any references to it so it matches the new descriptive name and makes
clear it asserts Index::search returns a SearchResponse with total == 0 and
empty hits when build_mlt_bool_query can't find the doc_id.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 02c52d8d-d733-4b50-930f-fb382c984bd1

📥 Commits

Reviewing files that changed from the base of the PR and between 2be8ac8 and 0856717.

📒 Files selected for processing (1)
  • rust/crates/cloudsearch-index/tests/coverage.rs

- mlt_with_like_and_empty_fields: fix weak >=1 check to verify
  both docs match (total==2) with correct ranking (doc1 > doc2)
- mlt_with_max_word_length: fix weak >=1 check to verify
  total==1 and doc2 is the only match

@poyrazK poyrazK left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to merge

@poyrazK poyrazK merged commit a3fc0e4 into main Jun 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant