feat(parquet): fuse level encoding passes and compact level representation by HippoBaro · Pull Request #9653 · apache/arrow-rs

HippoBaro · 2026-04-02T04:19:27Z

Which issue does this PR close?

Closes Parquet: level encoding cost should be proportional to RLE output size #9652.
Contributes to Column performance: run-proportional read/write cost #9731

Rationale for this change

See issue for details. The Parquet column writer currently does per-value work during level encoding regardless of data sparsity, even though the output encoding (RLE) is proportional to the number of runs.

What changes are included in this PR?

Three incremental commits, each building on the previous:

Fuse level encoding with counting and histogram updates. write_mini_batch() previously made three separate passes over each level array: count non-nulls, update the level histogram, and RLE-encode. Now all three happen in a single pass via an observer callback on LevelEncoder. When the RLE encoder enters accumulation mode, the loop scans ahead for the full run length and batches the observer call. This makes counting and histogram updates O(1) per run.
Batch consecutive null/empty rows in write_list. Consecutive null or empty list entries are now collapsed into a single visit_leaves() call that bulk-extends all leaf level buffers, instead of one tree traversal per null row. Mirrors the approach already used by write_struct().
Short-circuit entirely-null columns. When every element in an array is null, skip Vec<i16> level-buffer materialization entirely and store a compact (def_value, rep_value, count) tuple. The writer encodes this via RleEncoder::put_n() in O(1) amortized time, bypassing the normal mini-batch loop.

Are these changes tested?

All tests passing. I added some benchmark to exercice the heavy and all-null code paths, alongside the existing 25% sparseness benchmarks:

Name                                 Before      After      Delta
primitive_all_null/default           37.5 ms     0.20 ms    (−99.5%)
primitive_all_null/zstd              37.1 ms     0.30 ms    (−99.2%)
primitive_sparse_99pct_null/default  42.5 ms     15.7 ms    (−62.9%)
primitive_sparse_99pct_null/p2       42.4 ms     15.9 ms    (−62.4%)
list_prim_sparse_99pct_null/default  40.8 ms     11.2 ms    (−72.4%)
list_prim_sparse_99pct_null/p2       40.8 ms     10.7 ms    (−73.8%)
bool/default                         12.7 ms     10.3 ms    (−18.7%)
primitive/default                   124.1 ms    104.6 ms    (−15.6%)
string_and_binary_view/default       46.3 ms     41.6 ms    (−10.1%)
list_primitive/default              253.9 ms    235.3 ms    (−7.4%)
string_dictionary/default            46.2 ms     43.8 ms    (−5.3%)

Non-nullable column benchmarks are within noise, as expected since they have no definition levels to optimize.

Are there any user-facing changes?

None.

HippoBaro · 2026-04-02T05:10:59Z

This is a continuation of the work done in #9447 to improve runtime performance around sparse and/or highly uniform columns. As such this may be of interest to @alamb and @etseidl.

5a1d3d7 adds three benchmarks that exercise the code path this series optimizes. I created a PR (#9654) to merge those separately if needed so the benchmark bot can have a baseline to compare against.

Thanks!

etseidl · 2026-04-02T19:28:52Z

Thanks @HippoBaro, this looks impressive. I'm still looking, but haven't found any obvious problems yet.

Gads, every time I delve this deep into parquet I go a little mad 😵‍💫. I think the RLE encoder could use a little refactoring/comment improvements to make the flow a little more obvious. Not as part of this PR though.

etseidl

Flushing a few comments. More tomorrow.

etseidl · 2026-04-02T22:14:38Z

+            let mut values_to_write = 0usize;
+            let max_def = self.descr.max_def_level();
+            self.def_levels_encoder
+                .put_with_observer(levels, |level, count| {


❤️ When I added the histograms I wasn't happy with the redundancy here. Nice fix!

HippoBaro · 2026-04-08T21:25:56Z

Thanks for the reviews! I've reworked the branch to address all feedback. Sorry for the delay, it took me a while to experiment.

The main structural change is a LevelData enum refactor suggested by @jhorstmann. Thank you for the excellent suggestion. As I am primarily concerned with the performance of very sparse data, I hadn't considered the possibility to also speed up the non-null-but-uniform code path.

The Option<Vec<i16>> + uniform_levels: Option<(i16, i16, usize)> tuple is replaced by a single enum:

  enum LevelData {
      Absent,
      Materialized(Vec<i16>),
      Uniform { value: i16, count: usize },
  }

Absent replaces the previous None case, Uniform captures any column whose levels are a single repeated value (all-null, or nullable with no nulls), and Materialized is the normal vec path. This unifies the three states into one type and makes transitions between them easy to follow. This yields a nice performance improvement documented in ab9a7bc.

The resulting refactor has a larger LoC footprint, but the API is arguably much cleaner and robust.

Also, rebased as per #9656 (review)

etseidl · 2026-04-08T21:38:05Z

Thanks @HippoBaro. I'll try to make some time to review the changes. Probably not today but hopefully tomorrow... 🤞

alamb · 2026-04-09T11:37:02Z

run benchmark arrow_writer

adriangbot · 2026-04-09T11:38:06Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4213853435-1013-r8sfq 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing faster_sparse_columns_encoding (c891c35) to aac969d (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

HippoBaro · 2026-04-09T11:44:18Z

@alamb The above results will include only parts of the benchmarks this code improves on. The rest are in #9679

# Which issue does this PR close? - None, but relates to #9653 # Rationale for this change #9653 introduces optimizations related to non-null uniform workloads. This adds benchmarks so we can quantify them. # What changes are included in this PR? Add three new benchmark cases to the arrow_writer benchmark suite for evaluating write performance on struct columns at varying null densities: * `struct_non_null`: a nullable struct with 0% null rows and non-nullable primitive children; * `struct_sparse_99pct_null`: a nullable struct with 99% null rows, exercising null batching through one level of struct nesting; * `struct_all_null`: a nullable struct with 100% null rows, exercising the uniform-null path through struct nesting. Baseline results (Apple M1 Max): ``` struct_non_null/default 29.9 ms struct_non_null/parquet_2 38.2 ms struct_non_null/zstd_parquet_2 50.9 ms struct_sparse_99pct_null/default 7.2 ms struct_sparse_99pct_null/parquet_2 7.3 ms struct_sparse_99pct_null/zstd_p2 8.1 ms struct_all_null/default 83.3 µs struct_all_null/parquet_2 82.5 µs struct_all_null/zstd_parquet_2 106.6 µs ``` # Are these changes tested? N/A # Are there any user-facing changes? None Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

alamb · 2026-04-09T12:25:44Z

@alamb The above results will include only parts of the benchmarks this code improves on. The rest are in #9679

I merged it in and merged up from main and will rerun the benchmarks

alamb · 2026-04-09T12:25:49Z

run benchmark arrow_writer

adriangbot · 2026-04-09T12:28:31Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4214179774-1017-8wm74 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing faster_sparse_columns_encoding (6c73ac7) to adf9308 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-09T12:30:15Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                              faster_sparse_columns_encoding         main
-----                                              ------------------------------         ----
bool/bloom_filter                                  1.00     13.6±0.04ms    18.4 MB/sec    1.02     13.8±0.05ms    18.1 MB/sec
bool/default                                       1.00     11.3±0.03ms    22.0 MB/sec    1.03     11.7±0.04ms    21.4 MB/sec
bool/parquet_2                                     1.00     14.5±0.04ms    17.3 MB/sec    1.01     14.7±0.05ms    17.0 MB/sec
bool/zstd                                          1.00     11.9±0.04ms    21.0 MB/sec    1.03     12.2±0.05ms    20.4 MB/sec
bool/zstd_parquet_2                                1.00     14.8±0.03ms    16.9 MB/sec    1.02     15.1±0.05ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.02      7.1±0.02ms    17.6 MB/sec    1.00      7.0±0.03ms    17.9 MB/sec
bool_non_null/default                              1.00      4.2±0.02ms    30.0 MB/sec    1.01      4.2±0.02ms    29.8 MB/sec
bool_non_null/parquet_2                            1.02      8.2±0.03ms    15.3 MB/sec    1.00      8.0±0.02ms    15.6 MB/sec
bool_non_null/zstd                                 1.00      4.5±0.15ms    27.6 MB/sec    1.01      4.6±0.02ms    27.4 MB/sec
bool_non_null/zstd_parquet_2                       1.01      8.6±0.03ms    14.6 MB/sec    1.00      8.5±0.02ms    14.8 MB/sec
float_with_nans/bloom_filter                       1.00     91.3±0.41ms   153.3 MB/sec    1.03     94.5±0.25ms   148.2 MB/sec
float_with_nans/default                            1.00     72.8±0.22ms   192.4 MB/sec    1.04     75.4±0.39ms   185.8 MB/sec
float_with_nans/parquet_2                          1.00     94.5±0.24ms   148.2 MB/sec    1.03     97.4±0.26ms   143.8 MB/sec
float_with_nans/zstd                               1.00    110.5±0.18ms   126.7 MB/sec    1.02    113.0±0.17ms   123.9 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.9±0.20ms   106.2 MB/sec    1.02    134.9±0.24ms   103.8 MB/sec
list_primitive/bloom_filter                        1.06    386.7±3.83ms  1410.3 MB/sec    1.00    364.1±4.62ms  1497.7 MB/sec
list_primitive/default                             1.06    307.1±3.41ms  1776.1 MB/sec    1.00    288.7±2.08ms  1889.2 MB/sec
list_primitive/parquet_2                           1.14    323.5±1.33ms  1685.9 MB/sec    1.00   282.9±10.11ms  1928.0 MB/sec
list_primitive/zstd                                1.05    545.2±4.05ms  1000.3 MB/sec    1.00   518.7±10.83ms  1051.4 MB/sec
list_primitive/zstd_parquet_2                      1.03    523.3±1.78ms  1042.1 MB/sec    1.00    509.5±0.86ms  1070.3 MB/sec
list_primitive_non_null/bloom_filter               1.00   439.1±18.83ms  1239.4 MB/sec    1.04   458.2±14.37ms  1187.8 MB/sec
list_primitive_non_null/default                    1.00   312.0±12.54ms  1744.6 MB/sec    1.06   329.8±20.07ms  1650.3 MB/sec
list_primitive_non_null/parquet_2                  1.00   321.3±23.12ms  1694.0 MB/sec    1.15    368.4±0.92ms  1477.2 MB/sec
list_primitive_non_null/zstd                       1.01   734.4±16.25ms   741.1 MB/sec    1.00   728.5±18.97ms   747.1 MB/sec
list_primitive_non_null/zstd_parquet_2             1.05    719.4±0.75ms   756.5 MB/sec    1.00    683.5±1.62ms   796.3 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     12.3±0.06ms     3.0 GB/sec    3.53     43.4±0.53ms   860.8 MB/sec
list_primitive_sparse_99pct_null/default           1.00     12.0±0.06ms     3.0 GB/sec    3.59     43.0±0.57ms   869.1 MB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.9±0.06ms     3.1 GB/sec    3.61     43.1±0.54ms   867.1 MB/sec
list_primitive_sparse_99pct_null/zstd              1.00     13.8±0.08ms     2.6 GB/sec    3.25     44.9±0.56ms   831.5 MB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     12.0±0.04ms     3.0 GB/sec    3.59     43.2±0.56ms   865.4 MB/sec
primitive/bloom_filter                             1.00    151.7±0.50ms   295.9 MB/sec    1.03    156.7±0.40ms   286.4 MB/sec
primitive/default                                  1.00    120.3±0.29ms   373.1 MB/sec    1.04    124.7±0.28ms   359.7 MB/sec
primitive/parquet_2                                1.00    136.1±0.27ms   329.7 MB/sec    1.03    139.5±0.26ms   321.6 MB/sec
primitive/zstd                                     1.00    150.1±0.34ms   299.0 MB/sec    1.03    154.3±0.28ms   290.8 MB/sec
primitive/zstd_parquet_2                           1.00    169.1±0.50ms   265.4 MB/sec    1.02    173.2±0.25ms   259.0 MB/sec
primitive_all_null/bloom_filter                    1.00    893.4±2.99µs    49.1 GB/sec    43.78    39.1±0.02ms  1147.3 MB/sec
primitive_all_null/default                         1.00    274.9±2.10µs   159.4 GB/sec    139.39    38.3±0.02ms  1170.9 MB/sec
primitive_all_null/parquet_2                       1.00    275.6±1.50µs   159.0 GB/sec    139.02    38.3±0.03ms  1171.2 MB/sec
primitive_all_null/zstd                            1.00    385.3±1.79µs   113.8 GB/sec    99.82    38.5±0.03ms  1166.9 MB/sec
primitive_all_null/zstd_parquet_2                  1.00    350.5±1.40µs   125.0 GB/sec    109.54    38.4±0.03ms  1169.0 MB/sec
primitive_non_null/bloom_filter                    1.00    100.3±0.51ms   438.8 MB/sec    1.08    108.5±0.60ms   405.5 MB/sec
primitive_non_null/default                         1.00     61.4±0.16ms   716.9 MB/sec    1.13     69.7±0.28ms   631.7 MB/sec
primitive_non_null/parquet_2                       1.00     83.6±0.84ms   526.5 MB/sec    1.09     91.3±0.21ms   482.0 MB/sec
primitive_non_null/zstd                            1.00     92.2±0.18ms   477.1 MB/sec    1.14    105.5±2.36ms   416.9 MB/sec
primitive_non_null/zstd_parquet_2                  1.00    117.2±0.17ms   375.4 MB/sec    1.13    131.9±1.58ms   333.6 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.6±0.11ms     2.4 GB/sec    2.50     46.7±0.15ms   961.9 MB/sec
primitive_sparse_99pct_null/default                1.00     17.5±0.11ms     2.5 GB/sec    2.56     44.7±0.07ms  1003.6 MB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.2±0.04ms     2.6 GB/sec    2.61     44.8±0.15ms  1001.2 MB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.5±0.07ms     2.1 GB/sec    2.34     47.8±0.09ms   938.5 MB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     19.3±0.06ms     2.3 GB/sec    2.40     46.4±0.13ms   967.1 MB/sec
string/bloom_filter                                1.04   229.5±21.60ms     2.2 GB/sec    1.00   220.2±17.52ms     2.3 GB/sec
string/default                                     1.18   141.1±20.33ms     3.6 GB/sec    1.00    119.7±4.94ms     4.3 GB/sec
string/parquet_2                                   1.63    183.0±1.16ms     2.8 GB/sec    1.00    112.1±6.29ms     4.6 GB/sec
string/zstd                                        1.09   466.5±18.44ms  1123.7 MB/sec    1.00    428.2±3.02ms  1224.2 MB/sec
string/zstd_parquet_2                              1.00    397.8±1.05ms  1317.7 MB/sec    1.00    396.8±1.24ms  1321.2 MB/sec
string_and_binary_view/bloom_filter                1.00     66.7±0.33ms   483.4 MB/sec    1.01     67.4±0.16ms   478.5 MB/sec
string_and_binary_view/default                     1.00     49.3±0.19ms   653.8 MB/sec    1.02     50.5±0.26ms   638.8 MB/sec
string_and_binary_view/parquet_2                   1.00     60.5±0.18ms   533.3 MB/sec    1.01     61.0±0.09ms   528.6 MB/sec
string_and_binary_view/zstd                        1.00     85.9±0.23ms   375.3 MB/sec    1.01     86.9±0.14ms   371.2 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     74.4±0.12ms   433.3 MB/sec    1.01     74.9±0.09ms   430.7 MB/sec
string_dictionary/bloom_filter                     1.05     95.3±1.19ms     2.7 GB/sec    1.00     91.1±1.10ms     2.8 GB/sec
string_dictionary/default                          1.02     50.3±0.86ms     5.1 GB/sec    1.00     49.6±1.41ms     5.2 GB/sec
string_dictionary/parquet_2                        1.55     85.9±0.30ms     3.0 GB/sec    1.00     55.4±0.18ms     4.7 GB/sec
string_dictionary/zstd                             1.21    254.8±1.16ms  1036.6 MB/sec    1.00    211.4±1.58ms  1249.4 MB/sec
string_dictionary/zstd_parquet_2                   1.17    233.6±0.69ms  1130.5 MB/sec    1.00    199.5±0.32ms  1324.0 MB/sec
string_non_null/bloom_filter                       1.05   266.9±16.35ms  1963.1 MB/sec    1.00   253.8±12.45ms     2.0 GB/sec
string_non_null/default                            1.04   144.9±19.34ms     3.5 GB/sec    1.00   138.6±12.16ms     3.7 GB/sec
string_non_null/parquet_2                          1.00    140.8±7.72ms     3.6 GB/sec    1.01    142.5±2.88ms     3.6 GB/sec
string_non_null/zstd                               1.00   574.2±11.63ms   912.5 MB/sec    1.00   575.6±20.73ms   910.3 MB/sec
string_non_null/zstd_parquet_2                     1.00    502.5±1.97ms  1042.8 MB/sec    1.03    519.5±5.83ms  1008.7 MB/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	1544.1s
Peak memory	4.6 GiB
Avg memory	4.4 GiB
CPU user	1467.7s
CPU sys	76.2s
Peak spill	0 B

branch

Metric	Value
Wall time	1526.2s
Peak memory	4.6 GiB
Avg memory	4.4 GiB
CPU user	1425.0s
CPU sys	101.0s
Peak spill	0 B

File an issue against this benchmark runner

HippoBaro · 2026-04-09T12:56:53Z

I am surprised by the few regressions above, such as:

string_dictionary/parquet_2                        1.55     85.9±0.30ms     3.0 GB/sec    1.00     55.4±0.18ms     4.7 GB/sec

I can't reproduce these locally. I get:

string_dictionary/parquet_2
                        time:   [53.024 ms 53.574 ms 54.565 ms]
                        thrpt:  [4.7271 GiB/s 4.8146 GiB/s 4.8646 GiB/s]
                 change:
                        time:   [−3.0644% −1.9407% −0.1309%] (p = 0.01 < 0.05)
                        thrpt:  [+0.1311% +1.9791% +3.1613%]
                        Change within noise threshold.

Are these known to be noisy?

etseidl · 2026-04-09T13:17:01Z

I am surprised by the few regressions above, such as:
string_dictionary/parquet_2                        1.55     85.9±0.30ms     3.0 GB/sec    1.00     55.4±0.18ms     4.7 GB/sec
I can't reproduce these locally. I get:

Are these known to be noisy?

Yes. They are extremely twitchy. I always take them with a grain of salt or ten. 😅

etseidl · 2026-04-09T22:35:18Z

I've now run multiple passes of the arrow_writer bench on my workstation and there appear to be no regressions due to this PR. And the speed ups are quite impressive 😄

Details

group                                              levels                                 main
-----                                              ------                                 ----
bool/bloom_filter                                  1.00     12.9±0.09ms    19.4 MB/sec    1.00     12.8±0.12ms    19.5 MB/sec
bool/default                                       1.00      8.5±0.06ms    29.3 MB/sec    1.00      8.5±0.09ms    29.3 MB/sec
bool/parquet_2                                     1.01     11.2±0.18ms    22.2 MB/sec    1.00     11.1±0.17ms    22.5 MB/sec
bool/zstd                                          1.00      9.0±0.10ms    27.8 MB/sec    1.00      9.0±0.10ms    27.9 MB/sec
bool/zstd_parquet_2                                1.01     11.5±0.08ms    21.7 MB/sec    1.00     11.4±0.10ms    21.9 MB/sec
bool_non_null/bloom_filter                         1.02      8.6±0.04ms    14.6 MB/sec    1.00      8.4±0.03ms    14.8 MB/sec
bool_non_null/default                              1.05      2.9±0.01ms    42.4 MB/sec    1.00      2.8±0.04ms    44.4 MB/sec
bool_non_null/parquet_2                            1.02      6.2±0.04ms    20.1 MB/sec    1.00      6.1±0.03ms    20.6 MB/sec
bool_non_null/zstd                                 1.05      3.3±0.04ms    38.2 MB/sec    1.00      3.1±0.06ms    40.1 MB/sec
bool_non_null/zstd_parquet_2                       1.02      6.5±0.06ms    19.1 MB/sec    1.00      6.4±0.04ms    19.5 MB/sec
float_with_nans/bloom_filter                       1.00     81.2±0.69ms   172.4 MB/sec    1.08     87.7±0.42ms   159.7 MB/sec
float_with_nans/default                            1.00     58.0±0.86ms   241.4 MB/sec    1.08     62.8±0.28ms   222.9 MB/sec
float_with_nans/parquet_2                          1.00     71.6±1.10ms   195.6 MB/sec    1.07     76.9±0.49ms   182.2 MB/sec
float_with_nans/zstd                               1.00     88.6±0.36ms   158.0 MB/sec    1.07     94.6±0.36ms   148.0 MB/sec
float_with_nans/zstd_parquet_2                     1.00    101.4±0.80ms   138.1 MB/sec    1.06    107.9±0.96ms   129.7 MB/sec
list_primitive/bloom_filter                        1.06    319.5±1.83ms  1707.2 MB/sec    1.00    302.6±2.73ms  1802.0 MB/sec
list_primitive/default                             1.07    260.7±1.76ms     2.0 GB/sec    1.00    242.8±1.50ms     2.2 GB/sec
list_primitive/parquet_2                           1.00    257.0±1.68ms     2.1 GB/sec    1.00    257.5±3.19ms     2.1 GB/sec
list_primitive/zstd                                1.01    390.4±2.65ms  1397.1 MB/sec    1.00    388.3±3.31ms  1404.6 MB/sec
list_primitive/zstd_parquet_2                      1.03    387.2±2.82ms  1408.4 MB/sec    1.00    374.4±4.46ms  1456.7 MB/sec
list_primitive_non_null/bloom_filter               1.00    354.2±6.61ms  1536.5 MB/sec    1.02    360.1±4.36ms  1511.5 MB/sec
list_primitive_non_null/default                    1.00    262.5±7.11ms     2.0 GB/sec    1.01    265.3±5.08ms     2.0 GB/sec
list_primitive_non_null/parquet_2                  1.00    264.3±4.69ms     2.0 GB/sec    1.07    283.5±7.82ms  1919.6 MB/sec
list_primitive_non_null/zstd                       1.01   527.5±10.36ms  1031.7 MB/sec    1.00   520.9±19.26ms  1044.7 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    510.5±7.07ms  1066.1 MB/sec    1.00   509.9±13.27ms  1067.4 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00      9.2±0.06ms     4.0 GB/sec    3.15     29.0±0.24ms  1288.8 MB/sec
list_primitive_sparse_99pct_null/default           1.00      8.7±0.08ms     4.2 GB/sec    3.30     28.6±0.64ms  1304.7 MB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00      8.7±0.07ms     4.2 GB/sec    3.28     28.5±0.40ms  1310.8 MB/sec
list_primitive_sparse_99pct_null/zstd              1.00     10.3±0.10ms     3.5 GB/sec    2.91     29.9±0.21ms  1248.5 MB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00      8.8±0.10ms     4.1 GB/sec    3.22     28.4±0.25ms  1315.2 MB/sec
primitive/bloom_filter                             1.00    128.9±0.80ms   348.2 MB/sec    1.02    132.0±1.05ms   339.9 MB/sec
primitive/default                                  1.00     84.9±1.59ms   528.8 MB/sec    1.02     86.7±0.67ms   517.5 MB/sec
primitive/parquet_2                                1.00     94.6±1.36ms   474.4 MB/sec    1.02     96.9±0.76ms   463.2 MB/sec
primitive/zstd                                     1.00    104.0±0.78ms   431.6 MB/sec    1.03    107.2±1.27ms   418.5 MB/sec
primitive/zstd_parquet_2                           1.00    117.0±1.62ms   383.4 MB/sec    1.03    120.0±0.74ms   373.9 MB/sec
primitive_all_null/bloom_filter                    1.00   1058.5±6.49µs    41.4 GB/sec    18.25    19.3±0.10ms     2.3 GB/sec
primitive_all_null/default                         1.00    198.3±1.38µs   221.0 GB/sec    92.92    18.4±0.06ms     2.4 GB/sec
primitive_all_null/parquet_2                       1.00    200.9±1.97µs   218.2 GB/sec    91.94    18.5±0.09ms     2.4 GB/sec
primitive_all_null/zstd                            1.00    341.9±1.60µs   128.2 GB/sec    54.27    18.6±0.07ms     2.4 GB/sec
primitive_all_null/zstd_parquet_2                  1.00    317.2±1.37µs   138.2 GB/sec    58.48    18.5±0.08ms     2.4 GB/sec  
primitive_non_null/bloom_filter                    1.00     94.8±1.16ms   464.1 MB/sec    1.10    103.9±0.44ms   423.5 MB/sec
primitive_non_null/default                         1.00     38.5±0.22ms  1141.6 MB/sec    1.16     44.8±0.22ms   982.8 MB/sec
primitive_non_null/parquet_2                       1.00     52.7±0.51ms   834.4 MB/sec    1.13     59.4±1.01ms   740.3 MB/sec
primitive_non_null/zstd                            1.00     59.2±0.37ms   743.6 MB/sec    1.13     66.8±0.62ms   658.6 MB/sec
primitive_non_null/zstd_parquet_2                  1.00     76.0±0.98ms   579.1 MB/sec    1.11     84.1±1.49ms   523.2 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     12.9±0.27ms     3.4 GB/sec    2.23     28.8±0.70ms  1557.2 MB/sec
primitive_sparse_99pct_null/default                1.00     11.3±1.85ms     3.9 GB/sec    2.35     26.6±0.32ms  1686.3 MB/sec
primitive_sparse_99pct_null/parquet_2              1.00     11.6±1.71ms     3.8 GB/sec    2.30     26.8±0.28ms  1672.7 MB/sec
primitive_sparse_99pct_null/zstd                   1.00     13.8±0.14ms     3.2 GB/sec    2.13     29.4±0.29ms  1528.3 MB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     12.4±0.06ms     3.5 GB/sec    2.27     28.1±0.28ms  1595.2 MB/sec
string/bloom_filter                                1.00   169.3±11.30ms     3.0 GB/sec    1.05   178.1±13.47ms     2.9 GB/sec
string/default                                     1.05   121.8±12.92ms     4.2 GB/sec    1.00    116.3±3.32ms     4.4 GB/sec
string/parquet_2                                   1.03    120.8±6.66ms     4.2 GB/sec    1.00    117.6±1.10ms     4.4 GB/sec
string/zstd                                        1.00    308.2±4.22ms  1701.1 MB/sec    1.03   317.4±13.62ms  1651.6 MB/sec
string/zstd_parquet_2                              1.01    287.9±2.18ms  1821.2 MB/sec    1.00    284.1±1.61ms  1845.6 MB/sec
string_and_binary_view/bloom_filter                1.00     48.8±0.29ms   661.4 MB/sec    1.01     49.3±0.35ms   654.5 MB/sec
string_and_binary_view/default                     1.00     34.6±0.27ms   932.2 MB/sec    1.00     34.5±0.32ms   934.9 MB/sec
string_and_binary_view/parquet_2                   1.01     43.9±0.28ms   734.1 MB/sec    1.00     43.7±0.31ms   738.4 MB/sec
string_and_binary_view/zstd                        1.00     61.1±0.34ms   528.0 MB/sec    1.00     61.3±1.04ms   526.1 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     53.6±0.63ms   601.6 MB/sec    1.00     53.6±0.58ms   602.2 MB/sec
string_dictionary/bloom_filter                     1.00     76.4±0.63ms     3.4 GB/sec    1.42    108.4±0.44ms     2.4 GB/sec
string_dictionary/default                          1.00     51.7±0.24ms     5.0 GB/sec    1.58     81.8±0.34ms     3.2 GB/sec
string_dictionary/parquet_2                        1.00     55.5±0.66ms     4.6 GB/sec    1.50     83.5±0.55ms     3.1 GB/sec
string_dictionary/zstd                             1.00    150.0±1.17ms  1760.5 MB/sec    1.08    162.3±7.72ms  1627.8 MB/sec
string_dictionary/zstd_parquet_2                   1.00    142.7±0.88ms  1850.4 MB/sec    1.00    142.7±1.09ms  1850.5 MB/sec
string_non_null/bloom_filter                       1.00    191.4±1.91ms     2.7 GB/sec    1.09    208.4±8.39ms     2.5 GB/sec
string_non_null/default                            1.00    126.2±1.83ms     4.1 GB/sec    1.13    142.0±7.93ms     3.6 GB/sec
string_non_null/parquet_2                          1.00    137.1±2.30ms     3.7 GB/sec    1.00    137.7±1.85ms     3.7 GB/sec
string_non_null/zstd                               1.00    378.5±1.99ms  1384.4 MB/sec    1.06    400.3±7.49ms  1309.0 MB/sec
string_non_null/zstd_parquet_2                     1.00    359.4±2.26ms  1458.0 MB/sec    1.04    372.0±7.03ms  1408.5 MB/sec
struct_all_null/bloom_filter                       1.00    452.8±3.14µs    34.8 GB/sec    17.39     7.9±0.04ms  2047.7 MB/sec
struct_all_null/default                            1.00     85.5±0.63µs   184.1 GB/sec    87.80     7.5±0.04ms     2.1 GB/sec
struct_all_null/parquet_2                          1.00     86.5±1.38µs   182.0 GB/sec    86.71     7.5±0.03ms     2.1 GB/sec
struct_all_null/zstd                               1.00    146.8±1.12µs   107.3 GB/sec    51.77     7.6±0.09ms     2.1 GB/sec
struct_all_null/zstd_parquet_2                     1.00    136.4±1.14µs   115.4 GB/sec    55.50     7.6±0.06ms     2.1 GB/sec
struct_non_null/bloom_filter                       1.00     41.0±0.59ms   390.6 MB/sec    1.29     53.0±0.27ms   301.8 MB/sec
struct_non_null/default                            1.00     17.7±0.12ms   901.8 MB/sec    1.59     28.2±0.16ms   567.4 MB/sec
struct_non_null/parquet_2                          1.00     23.3±0.13ms   686.6 MB/sec    1.46     34.1±0.20ms   469.3 MB/sec
struct_non_null/zstd                               1.00     24.3±0.13ms   658.0 MB/sec    1.44     35.1±0.22ms   455.8 MB/sec
struct_non_null/zstd_parquet_2                     1.00     33.6±0.19ms   476.6 MB/sec    1.31     44.1±0.46ms   363.0 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      5.9±0.04ms     2.7 GB/sec    2.11     12.4±0.15ms  1303.5 MB/sec
struct_sparse_99pct_null/default                   1.00      5.0±0.04ms     3.2 GB/sec    2.32     11.6±0.11ms  1393.1 MB/sec
struct_sparse_99pct_null/parquet_2                 1.00      5.0±0.03ms     3.2 GB/sec    2.32     11.6±0.13ms  1393.7 MB/sec
struct_sparse_99pct_null/zstd                      1.00      6.2±0.04ms     2.6 GB/sec    2.07     12.8±0.19ms  1264.3 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      5.6±0.03ms     2.8 GB/sec    2.16     12.1±0.13ms  1330.1 MB/sec

etseidl · 2026-04-09T22:39:48Z

@kszucs do you have time to look at this PR? It touches on your CDC code.

alamb · 2026-04-13T20:23:43Z

I am hoping to review this tomorrow

alamb

Thank you @HippoBaro -- this is really exciting. I am sorry it is taking so long to review, but this is in some of the most performance critical and tricky code in the parquet writer.

I went through most of it pretty carefully and I really like where it is heading but as you can probably tell by the number of comments it is a pretty large change

What I would like to request is that we break this PR into smaller chunks to make review easier to veify to get this one in

Some suggested parts to break out:

BIT_PACK_GROUP_SIZE
The fast path changes to parquet/src/arrow/arrow_writer/levels.rs
The introduction of LevelData
Chang to use put_with_observer rather than put

Test coverage for fast path

One thing that came up during my review is that many of the newly added fast paths are not covered by tests/

To see this, you can run

 cargo llvm-cov  --html test -p parquet

Here is a copy of the result: llvm.zip

Here is an example showing that the fast paths aren't covered

alamb · 2026-04-14T19:54:18Z

-    fn write(&mut self, _values: &Self::Values, _offset: usize, _len: usize) -> Result<()> {
-        unreachable!("should call write_gather instead")
+    fn write(&mut self, values: &Self::Values, offset: usize, len: usize) -> Result<()> {
+        downcast_op!(


is this code actually callable now?

Yes! The code is now able to distinguish between Dense { offset, len } and Sparse(Vec<usize>). When the column has no nulls, write_leaf produces Dense directly without materializing a vec like previously and write_mini_batch then calls encoder.write(values, offset, len) based on that. Neat!

alamb · 2026-04-14T20:03:42Z


-/// Maximum groups of 8 values per bit-packed run. Current value is 64.
+/// Number of values in one bit-packed group. The Parquet RLE/bit-packing hybrid
+/// format always bit-packs values in multiples of this count (see the format spec:


Can you please provide a link in the comments to this statement

alamb · 2026-04-14T20:05:23Z

+    /// needed.  Callers may use [`extend_run`](Self::extend_run) to add further
+    /// repetitions in O(1) once this returns `true`.
+    #[inline]
+    pub fn is_accumulating(&self, value: u64) -> bool {


Perhaps calling it is_accumulating_rle would help readers understand more that this is specific for the RLE mode

alamb · 2026-04-14T20:06:29Z

        if self.current_value == value {
            self.repeat_count += 1;
-            if self.repeat_count > 8 {
+            if self.repeat_count > BIT_PACK_GROUP_SIZE {


the change to use a constant is better than hard coded constants --thank you

alamb · 2026-04-14T20:08:34Z

+
+    /// Increments the count for a level value by `count`.
+    #[inline]
+    pub fn update_n(&mut self, level: i16, count: i64) {


what does the n stand for here? As in why not call this update_count to match the inner?

Agreed, _n isn’t great (I’ll probably rename the put_n_* symbols as well). For now, I’ll go with increment_by. update_n|update_count can be interpreted as replacing the count with a new value. The function increments (as the doc already states), so this seems like the clearest option.

alamb · 2026-04-14T20:11:48Z

        match nulls {
            Some(nulls) => {
                let null_offset = range.start;
+                let mut pending_nulls: usize = 0;


I think it might also help future readers to define what empties means in this context (and how it is different than null)

alamb · 2026-04-14T20:20:07Z

+            }
+        }
+
+        match info.logical_nulls.clone() {


To appease the borrow-checker gods. The match arms need &mut info (for extend_def_levels etc.) Matching on &info.logical_nulls would hold a shared borrow on info for the entire arm, blocking the &mut self calls.

No clean way around it AFAICT.

Could std::mem::take and then restoring the field afterwards work?

Or inlining extend_value_indices/extend_def_levels again could work.

Maybe those are the alternatives that you considered not clean :)

Yes, I considered the mem::take approach, but it feels brittle if the control flow changes in the future or if there is a panic. Cloning the Arc is a bit sad because of the atomic, but it is foolproof 🤷

alamb · 2026-04-14T20:20:35Z

    /// incrementally across multiple batches without forcing run boundaries.
    /// The encoder is flushed automatically when [`consume`](Self::consume) is called.
    #[inline]
-    pub fn put(&mut self, buffer: &[i16]) -> usize {


In theory this is a breaking API change

However the LevelEncoder is part of the "experimental" API which is documented as not being stable

alamb · 2026-04-14T20:23:42Z

 }

+#[derive(Debug, Clone, Copy)]
+pub(crate) enum LevelDataRef<'a> {


This seems pretty similar to LevelData -- why can't we just use &LevelData?

If there is a good reason I think we need an explanation in comments

LevelDataRef is to LevelData what &str is to String. We need both because ArrayLevels owns its level data (LevelData::Materialized(Vec<i16>)), but write_batch_internal must also accept borrowed &[i16] slices from the public write_batch API without allocating.

Even if we could change the public interface, that seems fundamental.

I will write a proper doc on both of the types so the above context is easily accessible.

alamb · 2026-04-14T20:24:03Z

+}
+
+#[derive(Debug, Clone, Copy)]
+pub(crate) enum ValueSelectionRef<'a> {


likewise here -- an explanation of this and how it is related to ValueSelection would be really helpful

See #9653 (comment)

alamb · 2026-04-14T20:26:50Z

I can't wait to get this in -- so good

HippoBaro · 2026-04-15T04:43:44Z

Thanks again for the thorough reviews! I’ll keep working on this branch/PR to address the feedback (hopefully tomorrow) and for discussion purposes, but we can otherwise close it. I’ll make individual PRs as requested. Bear with me as I work through the many comments and break the commits into more digestible pieces 🙇

alamb · 2026-04-15T18:17:19Z

Thanks again for the thorough reviews! I’ll keep working on this branch/PR to address the feedback (hopefully tomorrow) and for discussion purposes, but we can otherwise close it. I’ll make individual PRs as requested. Bear with me as I work through the many comments and break the commits into more digestible pieces 🙇

100% -- thank you for being willing to do so

kszucs · 2026-04-15T21:28:26Z

I had a quick look at the levels and cdc changes and seems like a strict improvement without any noticeable issues; I will try to take a closer look tomorrow.

alamb · 2026-04-16T11:29:12Z

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

The literal `8` appeared in two distinct roles throughout `RleEncoder`, `RleDecoder`, and their tests. Replacing each with a named constant makes the intent explicit and prevents the two meanings from being confused. * `BIT_PACK_GROUP_SIZE = 8` The Parquet RLE/bit-packing hybrid format always bit-packs values in multiples of this count (spec: "we always bit-pack a multiple of 8 values at a time"). Every occurrence related to the staging buffer size, the repeat-count threshold that triggers the RLE decision, and the group-count arithmetic in bit-packed headers now uses this name. * `u8::BITS` (= 8, from std) Used wherever a bit-count is divided by 8 to obtain a byte-count (e.g. `ceil(bit_width, u8::BITS as usize)`). This is a bits-per-byte conversion, a fundamentally different concept from the packing-group size. No behaviour change. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

Add `put_with_observer()` to `LevelEncoder` that calls an `FnMut(i16, usize)` observer for each value during encoding. This allows callers to piggyback counting and histogram updates into the encoding pass without extra iterations over the level buffer. Previously, `write_mini_batch()` made 3 separate passes over each level array: one to count non-null values or row boundaries, one to update the level histogram, and one to RLE-encode. Now all three operations happen in a single pass via the observer closure. Replace `LevelHistogram::update_from_levels()` with a new `LevelHistogram::increment_by()` that accepts a count, and remove the now-unnecessary `update_definition_level_histogram()` and `update_repetition_level_histogram()` methods from PageMetrics. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

Add `is_accumulating_rle()` and `extend_run()` methods to `RleEncoder` that allow callers to detect when the encoder is in RLE accumulation mode and bulk-extend runs without per-element overhead. Upgrade `put_with_observer()` in `LevelEncoder` to exploit this: after each `put()`, it checks whether the encoder entered accumulation mode. If so, it scans ahead for the rest of the run, calls `extend_run()` to batch it in O(1), and fires the observer once with the full run length. This turns the previous O(n) per-value encoding + observation into O(1) amortized per RLE run, which is a significant improvement for sparse columns where long runs of identical levels are common. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

Restructure `write_list()` to accumulate consecutive null and empty rows and flush them in a single `visit_leaves()` call using `extend(repeat_n(...))`, instead of calling `visit_leaves()` per row. With sparse data (99% nulls), a 4096-row batch previously triggered ~4000 individual tree traversals, each pushing a single value per leaf. Now consecutive null/empty runs are collapsed into one traversal that extends all leaf level buffers in bulk. This follows the same pattern already used by `write_struct()`. The `write_non_null_slice` path is unchanged since each non-null row has different offsets and cannot be batched. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

Adds a bulk encoding method for repeated level values. After a small warmup to enter RLE accumulation mode, remaining values are extended in O(1) via the existing `extend_run` path. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

Introduces a `LevelData` enum (`Absent`, `Materialized`, `Uniform`) to replace `Option<Vec<i16>>` for definition and repetition levels, and a borrowed `LevelDataRef` counterpart for the writer path. Uniform columns (e.g. required fields, all-null pages) are now encoded in O(1) without materializing a dense list. The CDC chunker, column writer, and arrow writer are migrated to the new types. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

When an entire list, struct, fixed-size list, or leaf array is null, skip per-row iteration and emit bulk uniform def/rep levels via `extend_uniform_levels` in O(1). Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

Changes `byte_array` encoder methods (`FallbackEncoder::encode`, `DictEncoder::encode`, etc) and all `get_*_array_slice` functions from `&[usize]` to `impl ExactSizeIterator<Item = usize>`. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

HippoBaro · 2026-04-17T03:54:40Z

@alamb @etseidl Sorry for the delay—it took me longer than expected to reorganize the series into more individually reviewable commits. Test coverage is now substantial as well (hence the ballooning LoC diff.)

PRs #9751 and #9752 are up and can be reviewed/merged independently. The remaining commits build on that base, and on each other, so they can’t really be reviewed in parallel. I will published them once those two are merged. I expect a total of 5 to 6 PRs 😅.

Thank you again for all the reviews!

github-actions bot added the parquet Changes to the parquet crate label Apr 2, 2026

HippoBaro force-pushed the faster_sparse_columns_encoding branch from 335fb81 to 44dae05 Compare April 2, 2026 05:05

HippoBaro force-pushed the faster_sparse_columns_encoding branch from 44dae05 to 7902e69 Compare April 2, 2026 05:37

etseidl mentioned this pull request Apr 2, 2026

No longer allow BIT_PACKED level encoding in Parquet writer #9656

Merged

etseidl reviewed Apr 2, 2026

View reviewed changes

Comment thread parquet/src/encodings/rle.rs Outdated

etseidl reviewed Apr 2, 2026

View reviewed changes

Comment thread parquet/src/arrow/arrow_writer/levels.rs Outdated

jhorstmann reviewed Apr 7, 2026

View reviewed changes

Comment thread parquet/src/arrow/arrow_writer/levels.rs Outdated

HippoBaro force-pushed the faster_sparse_columns_encoding branch from 7902e69 to c891c35 Compare April 8, 2026 21:16

HippoBaro mentioned this pull request Apr 8, 2026

feat(parquet): add struct-column writer benchmarks #9679

Merged

HippoBaro changed the title ~~feat(parquet): fuse level encoding passes and batch null runs in column writer~~ feat(parquet): fuse level encoding passes and compact level representation Apr 8, 2026

HippoBaro requested review from etseidl and jhorstmann April 8, 2026 22:29

etseidl mentioned this pull request Apr 14, 2026

feat(parquet): make PushBuffers boundary-agnostic for prefetch IO #9697

Open

alamb reviewed Apr 14, 2026

View reviewed changes

alamb added the performance label Apr 14, 2026

etseidl mentioned this pull request Apr 15, 2026

Refactor RleEncoder::flush_bit_packed_run #9734

Closed

alamb marked this pull request as draft April 16, 2026 11:29

HippoBaro added 5 commits April 16, 2026 21:18

HippoBaro force-pushed the faster_sparse_columns_encoding branch from 6c73ac7 to d07829b Compare April 17, 2026 02:21

HippoBaro mentioned this pull request Apr 17, 2026

refactor(parquet): replace magic 8 literals with named constants #9751

Open

HippoBaro added 4 commits April 16, 2026 22:49

HippoBaro force-pushed the faster_sparse_columns_encoding branch from d07829b to 0b1fd2f Compare April 17, 2026 02:50

HippoBaro mentioned this pull request Apr 17, 2026

feat(parquet): batch consecutive null/empty rows in write_list #9752

Open

Conversation

HippoBaro commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

HippoBaro commented Apr 2, 2026

Uh oh!

Uh oh!

etseidl commented Apr 2, 2026

Uh oh!

etseidl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HippoBaro commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etseidl commented Apr 8, 2026

Uh oh!

alamb commented Apr 9, 2026

Uh oh!

adriangbot commented Apr 9, 2026

Uh oh!

HippoBaro commented Apr 9, 2026

Uh oh!

alamb commented Apr 9, 2026

Uh oh!

alamb commented Apr 9, 2026

Uh oh!

adriangbot commented Apr 9, 2026

Uh oh!

adriangbot commented Apr 9, 2026

Uh oh!

HippoBaro commented Apr 9, 2026

Uh oh!

etseidl commented Apr 9, 2026

Uh oh!

etseidl commented Apr 9, 2026

Uh oh!

etseidl commented Apr 9, 2026

Uh oh!

alamb commented Apr 13, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Test coverage for fast path

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HippoBaro Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

HippoBaro commented Apr 2, 2026 •

edited

Loading

HippoBaro commented Apr 8, 2026 •

edited

Loading

HippoBaro Apr 15, 2026 •

edited

Loading

HippoBaro commented Apr 17, 2026 •

edited

Loading