Skip to content

feat(parquet): batch consecutive null/empty rows in write_list#9752

Open
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:batch_consecute_null_rows
Open

feat(parquet): batch consecutive null/empty rows in write_list#9752
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:batch_consecute_null_rows

Conversation

@HippoBaro
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

See #9731

What changes are included in this PR?

Restructure write_list() to accumulate consecutive null and empty rows and flush them in a single visit_leaves() call using extend(repeat_n(...)), instead of calling visit_leaves() per row.

With sparse data (99% nulls), a 4096-row batch previously triggered ~4000 individual tree traversals, each pushing a single value per leaf. Now consecutive null/empty runs are collapsed into one traversal that extends all leaf level buffers in bulk.

This follows the same pattern already used by write_struct(). The write_non_null_slice path is unchanged since each non-null row has different offsets and cannot be batched.

Are these changes tested?

All tests passing; existing tests give 100% coverage.

Are there any user-facing changes?

N/A

Restructure `write_list()` to accumulate consecutive null and empty rows
and flush them in a single `visit_leaves()` call using
`extend(repeat_n(...))`, instead of calling `visit_leaves()` per row.

With sparse data (99% nulls), a 4096-row batch previously triggered
~4000 individual tree traversals, each pushing a single value per leaf.
Now consecutive null/empty runs are collapsed into one traversal that
extends all leaf level buffers in bulk.

This follows the same pattern already used by `write_struct()`. The
`write_non_null_slice` path is unchanged since each non-null row has
different offsets and cannot be batched.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant