feat(ipc): configurable zstd compression level#9748
Open
andraztori wants to merge 1 commit intoapache:mainfrom
Open
feat(ipc): configurable zstd compression level#9748andraztori wants to merge 1 commit intoapache:mainfrom
andraztori wants to merge 1 commit intoapache:mainfrom
Conversation
Add IpcCompression::Zstd(ZstdLevel) and thread the level into CompressionContext, mirroring parquet::basic::Compression::ZSTD. IpcWriteOptions::try_with_compression now takes Option<IpcCompression> (breaking source change; on-wire format unchanged). Co-generated-by: Cursor/Opus
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
arrow-ipccurrently hardcodes zstd tozstd::DEFAULT_COMPRESSION_LEVEL(level 3). Users who want tighter compression (for cold storage / WAN transfer) or faster compression (for hot paths) have no way to tune this without forking the crate.parquet::basic::Compression::ZSTD(ZstdLevel)already exposes the exact same knob, so users writing both Parquet and IPC get an inconsistent experience today.This PR adds configurable zstd compression levels to
arrow-ipc, mirroring the parquet API as closely as possible so the two stay familiar side-by-side.What changes are included in this PR?
arrow_ipc::compression::ZstdLevel(i32)— validated newtype matching the shape ofparquet::basic::ZstdLevel(same range1..=22, sametry_new/compression_level()/Default).arrow_ipc::compression::IpcCompressionenum — writer-side codec + parameter selector, analogous toparquet::basic::Compression:IpcWriteOptions::try_with_compressionnow takesOption<IpcCompression>instead ofOption<CompressionType>(source-breaking change, see below).CompressionContext::with_zstd_level(ZstdLevel)constructor;FileWriter/StreamWriterbuild their context via the configured level instead of the hardcoded default.ZstdLevelandIpcCompressionare re-exported fromarrow_ipc::writerso the public surface stays in one place.On-wire format is unchanged — the IPC flatbuffer
BodyCompression.codecenum is 1:1 with the wire codec; the zstd level is a purely writer-side parameter (decoders do not need to know it, same as in parquet).Are these changes tested?
Yes:
test_write_file_with_zstd_non_default_level— writes a record batch at a non-default zstd level through the publicFileWriterAPI and reads it back with the stockFileReader, verifying identity.test_write_file_with_zstd_compression, etc.).arrow-ipctests/benches,arrow-integration-testing) updated to the newIpcCompressiontype.Verified locally with
cargo fmt,cargo build -p arrow-ipc --all-features,cargo test -p arrow-ipc --all-features(107 unit tests + doctests pass), and builds ofarrow-flight/arrow-integration-testing.Are there any user-facing changes?
Yes — one source-breaking change to a public API:
Call-site migration:
Because this is a breaking change, it should land in the next major release (59.0.0). Happy to gate or defer if maintainers prefer.
Disclosure
This PR was drafted with AI assistance (Cursor / Anthropic Claude). All code has been reviewed, built, tested, and formatted locally by me. The design was chosen to mirror existing
parquetcrate conventions; no LLM-authored code was committed without review.Made with Cursor