What happens?
Querying an Iceberg table through DuckDB with a LOWER(varchar_column) = 'literal'
filter triggers an INTERNAL assertion failure inside the iceberg extension's
parquet filter pushdown:
INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32
The crash originates in MultiFileFunction<ParquetMultiFileInfo>::TryInitializeNextBatch
called from the iceberg extension's filter pushdown (frames inside
iceberg.duckdb_extension, then ConstantVector::VerifyVectorType<string_t>).
After the failure, the in-memory DuckDB connection is permanently invalidated —
every subsequent query (including ones that don't touch iceberg) fails with:
FATAL Error: Failed: database has been invalidated because of a previous fatal
error. The database must be restarted prior to being used again.
Original error: "Expected vector of type VARCHAR, but found vector of type INT32"
This is the more serious half of the bug: a single bad query takes down a
long-lived process. The only recovery is reconnecting.
To Reproduce
Affects any string column, regardless of whether it participates in the
partition spec.
INSTALL iceberg;
LOAD iceberg;
ATTACH '<your iceberg catalog>' AS cat (TYPE iceberg);
-- Schema used in repro: any iceberg table with at least one VARCHAR column.
-- Stored values may be mixed case (e.g. 'Trade', 'ORDERBOOK'), but it isn't
-- necessary — even a non-matching LOWER() filter crashes.
-- Crashes:
SELECT *
FROM cat.some.table
WHERE LOWER(message_type) = 'trade'
LIMIT 1;
-- INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32
-- The next query on the same connection then fails with:
-- FATAL Error: ... database has been invalidated ...
The crash reproduces on both:
- Partition columns (e.g. our
id column, BUCKET-transform partitioned: file
paths look like …/window_end_utc_day=YYYY-MM-DD/id_bucket=N/…parquet), AND
- Non-partition columns (e.g. our
message_type column, plain VARCHAR with no
partition transform).
So the trigger is the LOWER(col) = literal predicate itself, not the
partition-projection path.
Workarounds that do work and confirm the issue is in the iceberg
filter-pushdown path for LOWER:
-- OK: equality on raw column literal pushes down fine.
WHERE message_type = 'Trade'
-- OK: scan unfiltered, do LOWER() in pandas / a downstream operator.
SELECT message_type FROM cat.some.table
WHERE window_end_utc = TIMESTAMP '2026-03-20 12:00:00';
Versions
- DuckDB:
1.4.4
- Iceberg extension:
1095c1fa (installed from REPOSITORY,
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension)
- Platform:
linux_amd64 (Linux 5.14, x86_64)
- Catalog: REST (lakekeeper)
- File format: Parquet (snappy, written via pyarrow
write_table)
- Iceberg column types: standard
STRING/VARCHAR (no special encoding)
Full stack trace
INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32
…/_duckdb.cpython-314-x86_64-linux-gnu.so(duckdb::Exception::ToJSON…) [0x…]
…/_duckdb.…so(duckdb::InternalException::InternalException…) [0x…]
…/_duckdb.…so(duckdb::ConstantVector::VerifyVectorType<duckdb::string_t>…) [0x…]
…/_duckdb.…so(+0xecfd2d) [0x…]
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x101e1b6)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x1023425)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x1023669)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x5e54d3)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x601cff)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x60275f)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x5f815e)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0xc5c172)
…/_duckdb.…so(duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::TryInitializeNextBatch…) [0x…]
…/_duckdb.…so(duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::MultiFileInitLocal…) [0x…]
…/_duckdb.…so(duckdb::PhysicalTableScan::GetLocalSourceState…) [0x…]
…/_duckdb.…so(duckdb::PipelineExecutor::PipelineExecutor…) [0x…]
…/_duckdb.…so(duckdb::PipelineTask::ExecuteTask…) [0x…]
…/_duckdb.…so(duckdb::ExecutorTask::Execute…) [0x…]
…/_duckdb.…so(duckdb::TaskScheduler::ExecuteForever…) [0x…]
OS:
Linux 5.14, x86_64
DuckDB Version:
1.4.4
DuckDB Client:
Python
Hardware:
No response
Full Name:
Hon Ming Chan
Affiliation:
Maven Securities
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
What happens?
Querying an Iceberg table through DuckDB with a
LOWER(varchar_column) = 'literal'filter triggers an
INTERNALassertion failure inside the iceberg extension'sparquet filter pushdown:
The crash originates in
MultiFileFunction<ParquetMultiFileInfo>::TryInitializeNextBatchcalled from the iceberg extension's filter pushdown (frames inside
iceberg.duckdb_extension, thenConstantVector::VerifyVectorType<string_t>).After the failure, the in-memory DuckDB connection is permanently invalidated —
every subsequent query (including ones that don't touch iceberg) fails with:
This is the more serious half of the bug: a single bad query takes down a
long-lived process. The only recovery is reconnecting.
To Reproduce
Affects any string column, regardless of whether it participates in the
partition spec.
The crash reproduces on both:
idcolumn, BUCKET-transform partitioned: filepaths look like
…/window_end_utc_day=YYYY-MM-DD/id_bucket=N/…parquet), ANDmessage_typecolumn, plain VARCHAR with nopartition transform).
So the trigger is the
LOWER(col) = literalpredicate itself, not thepartition-projection path.
Workarounds that do work and confirm the issue is in the iceberg
filter-pushdown path for
LOWER:Versions
1.4.41095c1fa(installed fromREPOSITORY,~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension)linux_amd64(Linux 5.14, x86_64)write_table)STRING/VARCHAR(no special encoding)Full stack trace
OS:
Linux 5.14, x86_64
DuckDB Version:
1.4.4
DuckDB Client:
Python
Hardware:
No response
Full Name:
Hon Ming Chan
Affiliation:
Maven Securities
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?