Summary
H5CPP's chunked-filter read pipeline (basic_pipeline_t and pool_pipeline_t) returns corrupt data or fails outright on HDF5 ≥ 2.0 (reproduced on 2.1.1). HDF5 1.12.3 / 1.14.6 are unaffected.
Root cause
On HDF5 ≥ 2.0 the read path uses H5Dread_chunk2(..., &buf_size) (the 2.0 replacement for H5Dread_chunk, gated by H5_VERSION_GE(2,0,0)). Two bugs:
filter_mask ignored. H5Dread_chunk2 returns the per-chunk filter mask; a set bit means HDF5 stored the chunk without that filter (e.g. deflate skips chunks where compression doesn't pay — HDF5 2.x does this far more readily than 1.x). The reverse-filter loop applied every filter unconditionally, so an uncompressed chunk was "decompressed" → garbage.
buf_size mis-sized. The code passed buf_size = nbytes (the uncompressed chunk size) as both:
- the input capacity — too small for an expanding filter (
fletcher32 appends a 4-byte checksum, so the stored chunk is nbytes+4); strict 2.x H5Dread_chunk2 fails when the declared capacity is smaller than the stored chunk; and
- the reverse-filter input length — which must be the stored/compressed byte count that
H5Dread_chunk2 returns, not nbytes. (Pre-2.0 H5Dread_chunk gave no size and the deflate stream self-terminated, so nbytes happened to work — hiding the bug.)
Fix
In both H5Zpipeline_basic.hpp and H5Zpipeline_pool.hpp read paths:
- honour
filter_mask — pass masked filters through unchanged (still swapping buffers to keep the ping-pong parity that lands the result in chunk0);
- set the input
buf_size = filter::filter_scratch_bound(nbytes) (the real buffer capacity);
- use the returned
buf_size as the reverse-filter input length.
Verification
HDF5 2.1.1: test-h5dranges (gzip via h5::view) and test-h5coverage_edges (fletcher32) now pass. HDF5 1.12.3: still 62/62 (the mask change is a correctness improvement on all versions).
Out of scope (NOT h5cpp)
The remaining 2.1.1 failures (gzip round-trips — h5pall, h5dappend, h5zpipeline_parallel_read, csv_io, packet_table_io) are an HDF5 build issue: that 2.1.1 install is not linked against zlib (I/O filters (external): is empty; ldd libhdf5.so shows no libz), so the deflate filter is compiled but non-functional. Rebuild HDF5 with zlib (-DHDF5_ENABLE_Z_LIB_SUPPORT=ON + libz-dev).
Summary
H5CPP's chunked-filter read pipeline (
basic_pipeline_tandpool_pipeline_t) returns corrupt data or fails outright on HDF5 ≥ 2.0 (reproduced on 2.1.1). HDF5 1.12.3 / 1.14.6 are unaffected.Root cause
On HDF5 ≥ 2.0 the read path uses
H5Dread_chunk2(..., &buf_size)(the 2.0 replacement forH5Dread_chunk, gated byH5_VERSION_GE(2,0,0)). Two bugs:filter_maskignored.H5Dread_chunk2returns the per-chunk filter mask; a set bit means HDF5 stored the chunk without that filter (e.g. deflate skips chunks where compression doesn't pay — HDF5 2.x does this far more readily than 1.x). The reverse-filter loop applied every filter unconditionally, so an uncompressed chunk was "decompressed" → garbage.buf_sizemis-sized. The code passedbuf_size = nbytes(the uncompressed chunk size) as both:fletcher32appends a 4-byte checksum, so the stored chunk isnbytes+4); strict 2.xH5Dread_chunk2fails when the declared capacity is smaller than the stored chunk; andH5Dread_chunk2returns, notnbytes. (Pre-2.0H5Dread_chunkgave no size and the deflate stream self-terminated, sonbyteshappened to work — hiding the bug.)Fix
In both
H5Zpipeline_basic.hppandH5Zpipeline_pool.hppread paths:filter_mask— pass masked filters through unchanged (still swapping buffers to keep the ping-pong parity that lands the result inchunk0);buf_size = filter::filter_scratch_bound(nbytes)(the real buffer capacity);buf_sizeas the reverse-filter input length.Verification
HDF5 2.1.1:
test-h5dranges(gzip viah5::view) andtest-h5coverage_edges(fletcher32) now pass. HDF5 1.12.3: still 62/62 (the mask change is a correctness improvement on all versions).Out of scope (NOT h5cpp)
The remaining 2.1.1 failures (gzip round-trips — h5pall, h5dappend, h5zpipeline_parallel_read, csv_io, packet_table_io) are an HDF5 build issue: that 2.1.1 install is not linked against zlib (
I/O filters (external):is empty;ldd libhdf5.soshows nolibz), so the deflate filter is compiled but non-functional. Rebuild HDF5 with zlib (-DHDF5_ENABLE_Z_LIB_SUPPORT=ON+libz-dev).