Skip to content

feat(ipc): configurable zstd compression level#9748

Open
andraztori wants to merge 1 commit intoapache:mainfrom
andraztori:zstd-ipc-compression-level
Open

feat(ipc): configurable zstd compression level#9748
andraztori wants to merge 1 commit intoapache:mainfrom
andraztori:zstd-ipc-compression-level

Conversation

@andraztori
Copy link
Copy Markdown

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

arrow-ipc currently hardcodes zstd to zstd::DEFAULT_COMPRESSION_LEVEL (level 3). Users who want tighter compression (for cold storage / WAN transfer) or faster compression (for hot paths) have no way to tune this without forking the crate.

parquet::basic::Compression::ZSTD(ZstdLevel) already exposes the exact same knob, so users writing both Parquet and IPC get an inconsistent experience today.

This PR adds configurable zstd compression levels to arrow-ipc, mirroring the parquet API as closely as possible so the two stay familiar side-by-side.

What changes are included in this PR?

  • New arrow_ipc::compression::ZstdLevel(i32) — validated newtype matching the shape of parquet::basic::ZstdLevel (same range 1..=22, same try_new / compression_level() / Default).
  • New arrow_ipc::compression::IpcCompression enum — writer-side codec + parameter selector, analogous to parquet::basic::Compression:
    pub enum IpcCompression {
        Lz4Frame,
        Zstd(ZstdLevel),
    }
  • IpcWriteOptions::try_with_compression now takes Option<IpcCompression> instead of Option<CompressionType> (source-breaking change, see below).
  • CompressionContext::with_zstd_level(ZstdLevel) constructor; FileWriter / StreamWriter build their context via the configured level instead of the hardcoded default.
  • ZstdLevel and IpcCompression are re-exported from arrow_ipc::writer so the public surface stays in one place.

On-wire format is unchanged — the IPC flatbuffer BodyCompression.codec enum is 1:1 with the wire codec; the zstd level is a purely writer-side parameter (decoders do not need to know it, same as in parquet).

Are these changes tested?

Yes:

  • test_write_file_with_zstd_non_default_level — writes a record batch at a non-default zstd level through the public FileWriter API and reads it back with the stock FileReader, verifying identity.
  • Existing zstd round-trip / compression tests continue to pass (test_write_file_with_zstd_compression, etc.).
  • All in-crate callers (arrow-ipc tests/benches, arrow-integration-testing) updated to the new IpcCompression type.

Verified locally with cargo fmt, cargo build -p arrow-ipc --all-features, cargo test -p arrow-ipc --all-features (107 unit tests + doctests pass), and builds of arrow-flight / arrow-integration-testing.

Are there any user-facing changes?

Yes — one source-breaking change to a public API:

// Before:
pub fn try_with_compression(self, batch_compression: Option<CompressionType>) -> Result<Self, ArrowError>

// After:
pub fn try_with_compression(self, batch_compression: Option<IpcCompression>) -> Result<Self, ArrowError>

Call-site migration:

// Before
.try_with_compression(Some(CompressionType::ZSTD))?
.try_with_compression(Some(CompressionType::LZ4_FRAME))?

// After
.try_with_compression(Some(IpcCompression::zstd_default()))?           // same behavior as before
.try_with_compression(Some(IpcCompression::Zstd(ZstdLevel::try_new(9)?)))? // new: non-default level
.try_with_compression(Some(IpcCompression::Lz4Frame))?

Because this is a breaking change, it should land in the next major release (59.0.0). Happy to gate or defer if maintainers prefer.


Disclosure

This PR was drafted with AI assistance (Cursor / Anthropic Claude). All code has been reviewed, built, tested, and formatted locally by me. The design was chosen to mirror existing parquet crate conventions; no LLM-authored code was committed without review.

Made with Cursor

Add IpcCompression::Zstd(ZstdLevel) and thread the level into
CompressionContext, mirroring parquet::basic::Compression::ZSTD.
IpcWriteOptions::try_with_compression now takes Option<IpcCompression>
(breaking source change; on-wire format unchanged).

Co-generated-by: Cursor/Opus
@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant