Skip to content

parquet: optimize CachedArrayReader byte-array coalescing#9743

Open
ClSlaid wants to merge 1 commit intoapache:mainfrom
ClSlaid:issue-9060-cached-array-reader-byte-coalescer
Open

parquet: optimize CachedArrayReader byte-array coalescing#9743
ClSlaid wants to merge 1 commit intoapache:mainfrom
ClSlaid:issue-9060-cached-array-reader-byte-coalescer

Conversation

@ClSlaid
Copy link
Copy Markdown
Contributor

@ClSlaid ClSlaid commented Apr 16, 2026

When CachedArrayReader builds output from multiple cached batches, the old path materialized filtered byte arrays and then concatenated them. Replace that path for Utf8/Binary arrays with a direct coalescer that builds offsets, values, and validity in one output array, while keeping the existing generic MutableArrayData path for other types.

Add a dedicated CachedArrayReader benchmark and a sparse string regression test so this path is measured directly and covered independently of broader parquet reader benchmarks.

Benchmark vs main:

  • cached_array_reader/utf8_sparse_cross_batch_4m_rows/consume_batch: 11.949 ms -> 4.153 ms (-65.2%)
  • arrow_reader_clickbench/sync/Q24 (same filter/projection as ClickBench Q26): 28.377 ms -> 28.443 ms (+0.2%, no measurable change)

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

When CachedArrayReader builds output from multiple cached batches, the old path materialized filtered byte arrays and then concatenated them. Replace that path for Utf8/Binary arrays with a direct coalescer that builds offsets, values, and validity in one output array, while keeping the existing generic MutableArrayData path for other types.

Add a dedicated CachedArrayReader benchmark and a sparse string regression test so this path is measured directly and covered independently of broader parquet reader benchmarks.

Benchmark vs main:
- cached_array_reader/utf8_sparse_cross_batch_4m_rows/consume_batch: 11.949 ms -> 4.153 ms (-65.2%)
- arrow_reader_clickbench/sync/Q24 (same filter/projection as ClickBench Q26): 28.377 ms -> 28.443 ms (+0.2%, no measurable change)

Signed-off-by: cl <cailue@apache.org>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 16, 2026
@ClSlaid
Copy link
Copy Markdown
Contributor Author

ClSlaid commented Apr 17, 2026

@alamb I've tried to optimize with GPT 5.4, the improvement is not that obvious in the original test case you gave. So I let it wrote a new benchmark and optimized on it.

However, I'm still not really confident about the current implementation, so please have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[parquet] reduce the time spent in CachedArrayReader

1 participant