1.5.1 SIGSEGV in iceberg_scan DESCRIBE on a specific manifest pair (works on 1.4.4)

## Summary

`DESCRIBE SELECT * FROM iceberg_scan(<metadata.json>)` aborts the DuckDB process with SIGSEGV on **DuckDB 1.5.1 + iceberg extension** when the current snapshot references a specific pair of data manifests. The same query returns the 6-column schema cleanly on **DuckDB 1.4.4**, so this looks like a 1.5.x regression in the iceberg extension.

Exit code: `-11` on macOS arm64, `139` on Linux x86_64 (AWS Lambda amzn2023).

At single-`DESCRIBE` granularity the crash is flaky (~30%). Looping 200× inside a single connection makes it deterministic (5/5 across five trials on the minimised fixture).

## Reproducer

A self-contained `uv` script + the 4 metadata files needed to reproduce it are attached as `opti6367_upstream.zip` (see below — I'll drop it on the issue).

```bash
uv run repro.py
```

## Expected output

**DuckDB 1.5.1** (shipped):

```
duckdb 1.5.1

subprocess CRASHED (returncode=-11)
```

**DuckDB 1.4.4** (pin `duckdb==1.4.4` in the script header to verify):

```
duckdb 1.4.4
ROWS: 6

subprocess exited cleanly (returncode=0)
```

## About the fixture

The 4 files in `fixtures/` are the minimum required to read the current snapshot's schema:

| File | Notes |
| --- | --- |
| `v1.metadata.json` | table metadata, trimmed to the current snapshot |
| `snap-…-b7e0cdbc-….avro` | manifest list, trimmed to the two triggering manifests |
| `optimized-m-42f28226-….avro` | data manifest, 3 of 28 original Avro blocks |
| `optimized-m-55add7a1-….avro` | data manifest, 21 of 27 original Avro blocks |

The two `optimized-m-…` files come from a production table. They were written by our Spark `rewriteManifests` / compaction pass (the `optimized-m-` prefix is what the Spark job emits; this is **not** AWS Glue's managed OPTIMIZE action).

They were **truncated at Avro object-container block boundaries** — no re-encoding — so their deflate-compressed block payloads are byte-identical to the file that originally triggered the crash on the production Lambda. `fastavro` read→write through the same files loses the trigger even with the deflate codec preserved, which suggests the bug is sensitive to exact payload bytes (maybe block framing or compression output) rather than to the logical Iceberg content.

Narrowing either manifest to a single data-file entry (by writing a replacement manifest via PyIceberg) also loses the trigger — the crash requires cross-manifest content.

## Observations

- Crash is **flaky at single-DESCRIBE granularity**, ~30% on 1.5.1. It becomes reliable inside a single connection looping the query.
- Bundling 1 block from the first manifest with 22 blocks from the second (or 28 blocks + 1 block, etc.) does **not** crash — both manifests need content above some threshold.
- The `moto` S3 server receives no traffic beyond the four metadata files; no data parquet paths are dereferenced during `DESCRIBE`. The `s3://…` URIs baked into the Avro files are never read as data.

## Environment

Reproduced on:

- macOS 14.x (Darwin 25.4.0, arm64) — exit code `-11`
- Linux x86_64 (AWS Lambda, amzn2023) — exit code `139`

Python 3.10+, `duckdb==1.5.1`, `moto[server]==5.*`, `boto3`.

---

**Fixture attachment**: the 4 metadata files (~360KB uncompressed) are needed to run the reproducer. I'll attach `opti6367_upstream.zip` to this issue in a follow-up comment.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.5.1 SIGSEGV in iceberg_scan DESCRIBE on a specific manifest pair (works on 1.4.4) #950

Summary

Reproducer

Expected output

About the fixture

Observations

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	Notes
`v1.metadata.json`	table metadata, trimmed to the current snapshot
`snap-…-b7e0cdbc-….avro`	manifest list, trimmed to the two triggering manifests
`optimized-m-42f28226-….avro`	data manifest, 3 of 28 original Avro blocks
`optimized-m-55add7a1-….avro`	data manifest, 21 of 27 original Avro blocks

1.5.1 SIGSEGV in iceberg_scan DESCRIBE on a specific manifest pair (works on 1.4.4) #950

Description

Summary

Reproducer

Expected output

About the fixture

Observations

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions