feat: ColumnPageStream trait — single contract for streaming + legacy inputs (PR-5a)#6406
Draft
g-talbot wants to merge 1 commit intogtt/parquet-streaming-basefrom
Draft
feat: ColumnPageStream trait — single contract for streaming + legacy inputs (PR-5a)#6406g-talbot wants to merge 1 commit intogtt/parquet-streaming-basefrom
g-talbot wants to merge 1 commit intogtt/parquet-streaming-basefrom
Conversation
af78c8f to
2714921
Compare
051efcd to
499e1f1
Compare
61b6310 to
4ae07e7
Compare
7c24d04 to
43c386d
Compare
… inputs Extracts a `pub trait ColumnPageStream` from PR-4's concrete reader so PR-5 (legacy adapter) and PR-6 (streaming merge engine) can land in parallel against a stable contract instead of one rewriting the other. Trait shape: fn metadata(&self) -> &Arc<ParquetMetaData> async fn next_page(&mut self) -> Result<Option<Page>, ParquetReadError> Same invariants PR-4 already guarantees: row-group-major / column-major-within-RG / page-major-within-column yield order, idempotent EOF, and I/O failures surface as ParquetReadError::Io rather than being masked as decode errors. Implements the trait for `StreamingParquetReader`. Promotes visibility of `Page`, `ParquetReadError`, `RemoteByteSource`, `StreamingParquetReader`, and `StreamingReaderConfig` from `pub(crate)` to `pub` so downstream crates and PR-5/PR-6 can consume them. Adds `test_streaming_reader_satisfies_column_page_stream_trait`: drains the same fixture through both the concrete-typed and trait-object surfaces, asserts identical (rg_idx, col_idx, page_idx_in_col, compressed_page_size) sequences and idempotent EOF through the trait. Pure refactor — no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4ae07e7 to
d43186d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pub trait ColumnPageStreamfrom PR-4's concreteStreamingParquetReaderso PR-5 (legacy adapter) and PR-6 (streaming merge engine) can land in parallel against a stable contract instead of one rewriting the other.metadata() -> &Arc<ParquetMetaData>+async next_page() -> Result<Option<Page>, _>. Same yield-order / idempotent-EOF / I/O-error invariants PR-4 already guarantees.Page,ParquetReadError,RemoteByteSource,StreamingParquetReader,StreamingReaderConfigfrompub(crate)topub.Stack topology
This PR is part of the Streaming Parquet stack:
gtt/parquet-streaming-baseexists so PR-5/PR-6 can stack against the union of #6384 + #6386 without duplicating either's diff. When those land, the base branch and this PR retarget cleanly.Why now
Without this trait, PR-5 and PR-6 would both reach into the same concrete type and the second-to-land would have to refactor the first. Extracting the surface up-front is the smallest change that lets the two land independently.
Test plan
cargo nextest run -p quickwit-parquet-engine streaming_reader::tests— 14/14 pass, including newtest_streaming_reader_satisfies_column_page_stream_trait(drains the same fixture concrete-typed and through&mut dyn ColumnPageStream, asserts identical (rg_idx, col_idx, page_idx_in_col, compressed_page_size) sequence + idempotent EOF through the trait surface)cargo clippy --workspace --all-features --testswith-Dwarningscargo +nightly fmt --all -- --checkcargo doc --no-deps -p quickwit-parquet-enginecargo machetebash quickwit/scripts/check_license_headers.sh🤖 Generated with Claude Code