Skip to content

feat: ColumnPageStream trait — single contract for streaming + legacy inputs (PR-5a)#6406

Draft
g-talbot wants to merge 1 commit intogtt/parquet-streaming-basefrom
gtt/column-page-stream-trait
Draft

feat: ColumnPageStream trait — single contract for streaming + legacy inputs (PR-5a)#6406
g-talbot wants to merge 1 commit intogtt/parquet-streaming-basefrom
gtt/column-page-stream-trait

Conversation

@g-talbot
Copy link
Copy Markdown
Contributor

@g-talbot g-talbot commented May 8, 2026

Summary

  • Extracts pub trait ColumnPageStream from PR-4's concrete StreamingParquetReader so PR-5 (legacy adapter) and PR-6 (streaming merge engine) can land in parallel against a stable contract instead of one rewriting the other.
  • Trait shape: metadata() -> &Arc<ParquetMetaData> + async next_page() -> Result<Option<Page>, _>. Same yield-order / idempotent-EOF / I/O-error invariants PR-4 already guarantees.
  • Promotes visibility of Page, ParquetReadError, RemoteByteSource, StreamingParquetReader, StreamingReaderConfig from pub(crate) to pub.
  • Pure refactor, zero behavior change.

Stack topology

This PR is part of the Streaming Parquet stack:

main
└── gtt/parquet-streaming-base (= main ∪ PR-2 #6384 ∪ PR-4 #6386)
    └── gtt/column-page-stream-trait  ← PR-5a (this PR)
        ├── gtt/legacy-input-adapter   ← PR-5 (legacy multi-RG adapter)
        └── gtt/streaming-merge-engine ← PR-6 (streaming column-major merger)

gtt/parquet-streaming-base exists so PR-5/PR-6 can stack against the union of #6384 + #6386 without duplicating either's diff. When those land, the base branch and this PR retarget cleanly.

Why now

Without this trait, PR-5 and PR-6 would both reach into the same concrete type and the second-to-land would have to refactor the first. Extracting the surface up-front is the smallest change that lets the two land independently.

Test plan

  • cargo nextest run -p quickwit-parquet-engine streaming_reader::tests — 14/14 pass, including new test_streaming_reader_satisfies_column_page_stream_trait (drains the same fixture concrete-typed and through &mut dyn ColumnPageStream, asserts identical (rg_idx, col_idx, page_idx_in_col, compressed_page_size) sequence + idempotent EOF through the trait surface)
  • cargo clippy --workspace --all-features --tests with -Dwarnings
  • cargo +nightly fmt --all -- --check
  • cargo doc --no-deps -p quickwit-parquet-engine
  • cargo machete
  • bash quickwit/scripts/check_license_headers.sh

🤖 Generated with Claude Code

@g-talbot g-talbot force-pushed the gtt/column-page-stream-trait branch from af78c8f to 2714921 Compare May 8, 2026 20:49
@g-talbot g-talbot force-pushed the gtt/parquet-streaming-base branch from 051efcd to 499e1f1 Compare May 8, 2026 21:26
@g-talbot g-talbot force-pushed the gtt/column-page-stream-trait branch 2 times, most recently from 61b6310 to 4ae07e7 Compare May 8, 2026 21:46
@g-talbot g-talbot force-pushed the gtt/parquet-streaming-base branch from 7c24d04 to 43c386d Compare May 9, 2026 00:07
… inputs

Extracts a `pub trait ColumnPageStream` from PR-4's concrete reader so PR-5
(legacy adapter) and PR-6 (streaming merge engine) can land in parallel
against a stable contract instead of one rewriting the other.

Trait shape:
  fn metadata(&self) -> &Arc<ParquetMetaData>
  async fn next_page(&mut self) -> Result<Option<Page>, ParquetReadError>

Same invariants PR-4 already guarantees: row-group-major /
column-major-within-RG / page-major-within-column yield order, idempotent
EOF, and I/O failures surface as ParquetReadError::Io rather than being
masked as decode errors.

Implements the trait for `StreamingParquetReader`. Promotes visibility
of `Page`, `ParquetReadError`, `RemoteByteSource`, `StreamingParquetReader`,
and `StreamingReaderConfig` from `pub(crate)` to `pub` so downstream
crates and PR-5/PR-6 can consume them.

Adds `test_streaming_reader_satisfies_column_page_stream_trait`: drains
the same fixture through both the concrete-typed and trait-object surfaces,
asserts identical (rg_idx, col_idx, page_idx_in_col, compressed_page_size)
sequences and idempotent EOF through the trait.

Pure refactor — no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot force-pushed the gtt/column-page-stream-trait branch from 4ae07e7 to d43186d Compare May 9, 2026 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant