Skip to content

fix, HDF5 2.x chunked-filter read ignores filter_mask and mis-sizes H5Dread_chunk2 buffer #298

@steven-varga

Description

@steven-varga

Summary

H5CPP's chunked-filter read pipeline (basic_pipeline_t and pool_pipeline_t) returns corrupt data or fails outright on HDF5 ≥ 2.0 (reproduced on 2.1.1). HDF5 1.12.3 / 1.14.6 are unaffected.

Root cause

On HDF5 ≥ 2.0 the read path uses H5Dread_chunk2(..., &buf_size) (the 2.0 replacement for H5Dread_chunk, gated by H5_VERSION_GE(2,0,0)). Two bugs:

  1. filter_mask ignored. H5Dread_chunk2 returns the per-chunk filter mask; a set bit means HDF5 stored the chunk without that filter (e.g. deflate skips chunks where compression doesn't pay — HDF5 2.x does this far more readily than 1.x). The reverse-filter loop applied every filter unconditionally, so an uncompressed chunk was "decompressed" → garbage.
  2. buf_size mis-sized. The code passed buf_size = nbytes (the uncompressed chunk size) as both:
    • the input capacity — too small for an expanding filter (fletcher32 appends a 4-byte checksum, so the stored chunk is nbytes+4); strict 2.x H5Dread_chunk2 fails when the declared capacity is smaller than the stored chunk; and
    • the reverse-filter input length — which must be the stored/compressed byte count that H5Dread_chunk2 returns, not nbytes. (Pre-2.0 H5Dread_chunk gave no size and the deflate stream self-terminated, so nbytes happened to work — hiding the bug.)

Fix

In both H5Zpipeline_basic.hpp and H5Zpipeline_pool.hpp read paths:

  • honour filter_mask — pass masked filters through unchanged (still swapping buffers to keep the ping-pong parity that lands the result in chunk0);
  • set the input buf_size = filter::filter_scratch_bound(nbytes) (the real buffer capacity);
  • use the returned buf_size as the reverse-filter input length.

Verification

HDF5 2.1.1: test-h5dranges (gzip via h5::view) and test-h5coverage_edges (fletcher32) now pass. HDF5 1.12.3: still 62/62 (the mask change is a correctness improvement on all versions).

Out of scope (NOT h5cpp)

The remaining 2.1.1 failures (gzip round-trips — h5pall, h5dappend, h5zpipeline_parallel_read, csv_io, packet_table_io) are an HDF5 build issue: that 2.1.1 install is not linked against zlib (I/O filters (external): is empty; ldd libhdf5.so shows no libz), so the deflate filter is compiled but non-functional. Rebuild HDF5 with zlib (-DHDF5_ENABLE_Z_LIB_SUPPORT=ON + libz-dev).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions