spath: parquet-backed test indices for analytics-engine route#5441
spath: parquet-backed test indices for analytics-engine route#5441ahkcs wants to merge 1 commit into
Conversation
PR Reviewer Guide 🔍(Review updated until commit 5eb2dd4)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 5eb2dd4 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit c4d4a8c
Suggestions up to commit 9f6aef8
Suggestions up to commit f5ea743
Suggestions up to commit 1a07691
|
1a07691 to
f5ea743
Compare
|
Persistent review updated to latest commit f5ea743 |
f5ea743 to
9f6aef8
Compare
| private static final String ARROW_TEXT_CLASS_NAME = "org.apache.arrow.vector.util.Text"; | ||
|
|
||
| /** | ||
| * Whether {@code o} is an Arrow {@code Text} (the UTF-8 byte-buffer wrapper that arrow's Map / | ||
| * Struct / List vectors emit for string values). FQN match keeps {@code core/} free of an Arrow | ||
| * dependency. | ||
| */ | ||
| private static boolean isArrowText(Object o) { | ||
| return o != null && ARROW_TEXT_CLASS_NAME.equals(o.getClass().getName()); | ||
| } |
There was a problem hiding this comment.
ExprValueUtils should do not know ARROW data type. Why ExprValueUtils been used on execution code path?
There was a problem hiding this comment.
Good catch, updated to remove the change
| private static final String AUTO_DOC_MAPPING = | ||
| "{\"mappings\":{\"properties\":{" | ||
| + "\"nested_doc\":{\"type\":\"keyword\"}," | ||
| + "\"array_doc\":{\"type\":\"keyword\"}," | ||
| + "\"merge_doc\":{\"type\":\"keyword\"}," | ||
| + "\"stringify_doc\":{\"type\":\"keyword\"}," | ||
| + "\"empty_doc\":{\"type\":\"keyword\"}," | ||
| + "\"malformed_doc\":{\"type\":\"keyword\"}}}}"; |
There was a problem hiding this comment.
Why add auto_doc_mapping? Becuae Analytics Eengine does not support dynamic mapping?
There was a problem hiding this comment.
Updated to remove explicit mapping, currently our IT creates the index using the lazy way, which makes it a default lucene-backed index, the change is to create the empty index up-front through the helper so the parquet toggle gets a chance to inject its settings
9f6aef8 to
c4d4a8c
Compare
`CalcitePPLSpathCommandIT.init()` was creating its four test indices by raw `PUT /<idx>/_doc/N` requests, which auto-creates the index via the default Lucene path. The analytics-engine compatibility run (`-Dtests.analytics.parquet_indices=true`) injects the parquet/composite settings *inside* `TestUtils.createIndexByRestClient`, so the raw-PUT indices were Lucene-only and DataFusion failed with `UnsupportedOperationException: acquireReader is not supported in EngineBackedIndexer` for every test on the analytics-engine route. Fix: create the empty index up-front via `createIndexByRestClient(..., null)` so the toggle has a chance to inject parquet settings, then let the subsequent doc PUTs populate it via dynamic mapping. No mapping is declared — DataFusion is fine with dynamic mapping on a parquet-backed composite index. Same pattern as `CalciteEvalCommandIT` and `CalciteFieldFormatCommandIT`. No change for the v2 / Calcite path (the helper is a no-op when the parquet toggle isn't set). ## Pass rate Pairs with opensearch-project/OpenSearch#21664. Both PRs are required to move the analytics-engine route off 0 / 16. | IT | Route | Before | After | |---|---|---|---| | `CalcitePPLSpathCommandIT` | analytics-engine (`-Dtests.analytics.force_routing=true -Dtests.analytics.parquet_indices=true`) | 0 / 16 | **16 / 16** | | `CalcitePPLSpathCommandIT` | default v2 / Calcite (no flags) | 16 / 16 | 16 / 16 (no regression) | Signed-off-by: Kai Huang <ahkcs@amazon.com>
c4d4a8c to
5eb2dd4
Compare
|
Persistent review updated to latest commit 5eb2dd4 |
Pairs with opensearch-project/OpenSearch#21664. Both PRs are required to move
CalcitePPLSpathCommandIToff 0 / 16 on the analytics-engine route.What the change does
CalcitePPLSpathCommandIT.init()was creating its four test indices by rawPUT /<idx>/_doc/Nrequests, which auto-creates the index via the default Lucene path. The analytics-engine compatibility run (-Dtests.analytics.parquet_indices=true) only injects parquet/composite settings insideTestUtils.createIndexByRestClient, so the raw-PUT indices were Lucene-only — and DataFusion fails withUnsupportedOperationException: acquireReader is not supported in EngineBackedIndexeron any Lucene-only index.The fix is one line per index: create the empty index up-front through the helper so the parquet toggle gets a chance to inject its settings, then let the existing doc PUTs populate it via dynamic mapping.
No mapping is declared (
nullmapping argument) — DataFusion handles dynamic mapping on parquet-backed composite indices just fine. Same pattern asCalciteEvalCommandITandCalciteFieldFormatCommandIT. No change for the v2 / Calcite path; the helper is a no-op when the parquet toggle isn't set.Pass rate
CalcitePPLSpathCommandIT-Dtests.analytics.force_routing=true -Dtests.analytics.parquet_indices=true)CalcitePPLSpathCommandIT