Skip to content

LOWER(varchar_col) = literal filter pushdown crashes parquet scan with Expected vector of type VARCHAR, but found vector of type INT32 and permanently invalidates the connection #970

@siumingdev

Description

@siumingdev

What happens?

Querying an Iceberg table through DuckDB with a LOWER(varchar_column) = 'literal'
filter triggers an INTERNAL assertion failure inside the iceberg extension's
parquet filter pushdown:

INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32

The crash originates in MultiFileFunction<ParquetMultiFileInfo>::TryInitializeNextBatch
called from the iceberg extension's filter pushdown (frames inside
iceberg.duckdb_extension, then ConstantVector::VerifyVectorType<string_t>).

After the failure, the in-memory DuckDB connection is permanently invalidated
every subsequent query (including ones that don't touch iceberg) fails with:

FATAL Error: Failed: database has been invalidated because of a previous fatal
error. The database must be restarted prior to being used again.
Original error: "Expected vector of type VARCHAR, but found vector of type INT32"

This is the more serious half of the bug: a single bad query takes down a
long-lived process. The only recovery is reconnecting.

To Reproduce

Affects any string column, regardless of whether it participates in the
partition spec.

INSTALL iceberg;
LOAD iceberg;

ATTACH '<your iceberg catalog>' AS cat (TYPE iceberg);

-- Schema used in repro: any iceberg table with at least one VARCHAR column.
-- Stored values may be mixed case (e.g. 'Trade', 'ORDERBOOK'), but it isn't
-- necessary — even a non-matching LOWER() filter crashes.

-- Crashes:
SELECT *
FROM cat.some.table
WHERE LOWER(message_type) = 'trade'
LIMIT 1;
-- INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32

-- The next query on the same connection then fails with:
-- FATAL Error: ... database has been invalidated ...

The crash reproduces on both:

  • Partition columns (e.g. our id column, BUCKET-transform partitioned: file
    paths look like …/window_end_utc_day=YYYY-MM-DD/id_bucket=N/…parquet), AND
  • Non-partition columns (e.g. our message_type column, plain VARCHAR with no
    partition transform).

So the trigger is the LOWER(col) = literal predicate itself, not the
partition-projection path.

Workarounds that do work and confirm the issue is in the iceberg
filter-pushdown path for LOWER:

-- OK: equality on raw column literal pushes down fine.
WHERE message_type = 'Trade'

-- OK: scan unfiltered, do LOWER() in pandas / a downstream operator.
SELECT message_type FROM cat.some.table
WHERE window_end_utc = TIMESTAMP '2026-03-20 12:00:00';

Versions

  • DuckDB: 1.4.4
  • Iceberg extension: 1095c1fa (installed from REPOSITORY,
    ~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension)
  • Platform: linux_amd64 (Linux 5.14, x86_64)
  • Catalog: REST (lakekeeper)
  • File format: Parquet (snappy, written via pyarrow write_table)
  • Iceberg column types: standard STRING/VARCHAR (no special encoding)

Full stack trace

INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32

…/_duckdb.cpython-314-x86_64-linux-gnu.so(duckdb::Exception::ToJSON…) [0x…]
…/_duckdb.…so(duckdb::InternalException::InternalException…) [0x…]
…/_duckdb.…so(duckdb::ConstantVector::VerifyVectorType<duckdb::string_t>…) [0x…]
…/_duckdb.…so(+0xecfd2d) [0x…]
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x101e1b6)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x1023425)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x1023669)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x5e54d3)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x601cff)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x60275f)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x5f815e)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0xc5c172)
…/_duckdb.…so(duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::TryInitializeNextBatch…) [0x…]
…/_duckdb.…so(duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::MultiFileInitLocal…) [0x…]
…/_duckdb.…so(duckdb::PhysicalTableScan::GetLocalSourceState…) [0x…]
…/_duckdb.…so(duckdb::PipelineExecutor::PipelineExecutor…) [0x…]
…/_duckdb.…so(duckdb::PipelineTask::ExecuteTask…) [0x…]
…/_duckdb.…so(duckdb::ExecutorTask::Execute…) [0x…]
…/_duckdb.…so(duckdb::TaskScheduler::ExecuteForever…) [0x…]

OS:

Linux 5.14, x86_64

DuckDB Version:

1.4.4

DuckDB Client:

Python

Hardware:

No response

Full Name:

Hon Ming Chan

Affiliation:

Maven Securities

Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs triageNeeds to be triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions