Skip to content

1.5.1 SIGSEGV in iceberg_scan DESCRIBE on a specific manifest pair (works on 1.4.4) #950

@tomlarkworthy

Description

@tomlarkworthy

Summary

DESCRIBE SELECT * FROM iceberg_scan(<metadata.json>) aborts the DuckDB process with SIGSEGV on DuckDB 1.5.1 + iceberg extension when the current snapshot references a specific pair of data manifests. The same query returns the 6-column schema cleanly on DuckDB 1.4.4, so this looks like a 1.5.x regression in the iceberg extension.

Exit code: -11 on macOS arm64, 139 on Linux x86_64 (AWS Lambda amzn2023).

At single-DESCRIBE granularity the crash is flaky (~30%). Looping 200× inside a single connection makes it deterministic (5/5 across five trials on the minimised fixture).

Reproducer

A self-contained uv script + the 4 metadata files needed to reproduce it are attached as opti6367_upstream.zip (see below — I'll drop it on the issue).

uv run repro.py

Expected output

DuckDB 1.5.1 (shipped):

duckdb 1.5.1

subprocess CRASHED (returncode=-11)

DuckDB 1.4.4 (pin duckdb==1.4.4 in the script header to verify):

duckdb 1.4.4
ROWS: 6

subprocess exited cleanly (returncode=0)

About the fixture

The 4 files in fixtures/ are the minimum required to read the current snapshot's schema:

File Notes
v1.metadata.json table metadata, trimmed to the current snapshot
snap-…-b7e0cdbc-….avro manifest list, trimmed to the two triggering manifests
optimized-m-42f28226-….avro data manifest, 3 of 28 original Avro blocks
optimized-m-55add7a1-….avro data manifest, 21 of 27 original Avro blocks

The two optimized-m-… files come from a production table. They were written by our Spark rewriteManifests / compaction pass (the optimized-m- prefix is what the Spark job emits; this is not AWS Glue's managed OPTIMIZE action).

They were truncated at Avro object-container block boundaries — no re-encoding — so their deflate-compressed block payloads are byte-identical to the file that originally triggered the crash on the production Lambda. fastavro read→write through the same files loses the trigger even with the deflate codec preserved, which suggests the bug is sensitive to exact payload bytes (maybe block framing or compression output) rather than to the logical Iceberg content.

Narrowing either manifest to a single data-file entry (by writing a replacement manifest via PyIceberg) also loses the trigger — the crash requires cross-manifest content.

Observations

  • Crash is flaky at single-DESCRIBE granularity, ~30% on 1.5.1. It becomes reliable inside a single connection looping the query.
  • Bundling 1 block from the first manifest with 22 blocks from the second (or 28 blocks + 1 block, etc.) does not crash — both manifests need content above some threshold.
  • The moto S3 server receives no traffic beyond the four metadata files; no data parquet paths are dereferenced during DESCRIBE. The s3://… URIs baked into the Avro files are never read as data.

Environment

Reproduced on:

  • macOS 14.x (Darwin 25.4.0, arm64) — exit code -11
  • Linux x86_64 (AWS Lambda, amzn2023) — exit code 139

Python 3.10+, duckdb==1.5.1, moto[server]==5.*, boto3.


Fixture attachment: the 4 metadata files (~360KB uncompressed) are needed to run the reproducer. I'll attach opti6367_upstream.zip to this issue in a follow-up comment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions