`LOWER(varchar_col) = literal` filter pushdown crashes parquet scan with `Expected vector of type VARCHAR, but found vector of type INT32` and permanently invalidates the connection

### What happens?

Querying an Iceberg table through DuckDB with a `LOWER(varchar_column) = 'literal'`
filter triggers an `INTERNAL` assertion failure inside the iceberg extension's
parquet filter pushdown:

```
INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32
```

The crash originates in `MultiFileFunction<ParquetMultiFileInfo>::TryInitializeNextBatch`
called from the iceberg extension's filter pushdown (frames inside
`iceberg.duckdb_extension`, then `ConstantVector::VerifyVectorType<string_t>`).

After the failure, the in-memory DuckDB connection is **permanently invalidated** —
every subsequent query (including ones that don't touch iceberg) fails with:

```
FATAL Error: Failed: database has been invalidated because of a previous fatal
error. The database must be restarted prior to being used again.
Original error: "Expected vector of type VARCHAR, but found vector of type INT32"
```

This is the more serious half of the bug: a single bad query takes down a
long-lived process. The only recovery is reconnecting.

### To Reproduce

Affects any string column, regardless of whether it participates in the
partition spec.

```sql
INSTALL iceberg;
LOAD iceberg;

ATTACH '<your iceberg catalog>' AS cat (TYPE iceberg);

-- Schema used in repro: any iceberg table with at least one VARCHAR column.
-- Stored values may be mixed case (e.g. 'Trade', 'ORDERBOOK'), but it isn't
-- necessary — even a non-matching LOWER() filter crashes.

-- Crashes:
SELECT *
FROM cat.some.table
WHERE LOWER(message_type) = 'trade'
LIMIT 1;
-- INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32

-- The next query on the same connection then fails with:
-- FATAL Error: ... database has been invalidated ...
```

The crash reproduces on both:

- Partition columns (e.g. our `id` column, BUCKET-transform partitioned: file
  paths look like `…/window_end_utc_day=YYYY-MM-DD/id_bucket=N/…parquet`), AND
- Non-partition columns (e.g. our `message_type` column, plain VARCHAR with no
  partition transform).

So the trigger is the `LOWER(col) = literal` predicate itself, not the
partition-projection path.

Workarounds that **do** work and confirm the issue is in the iceberg
filter-pushdown path for `LOWER`:

```sql
-- OK: equality on raw column literal pushes down fine.
WHERE message_type = 'Trade'

-- OK: scan unfiltered, do LOWER() in pandas / a downstream operator.
SELECT message_type FROM cat.some.table
WHERE window_end_utc = TIMESTAMP '2026-03-20 12:00:00';
```

## Versions

- DuckDB: `1.4.4`
- Iceberg extension: `1095c1fa` (installed from `REPOSITORY`,
  `~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension`)
- Platform: `linux_amd64` (Linux 5.14, x86_64)
- Catalog: REST (lakekeeper)
- File format: Parquet (snappy, written via pyarrow `write_table`)
- Iceberg column types: standard `STRING`/`VARCHAR` (no special encoding)

## Full stack trace

```
INTERNAL Error: Expected vector of type VARCHAR, but found vector of type INT32

…/_duckdb.cpython-314-x86_64-linux-gnu.so(duckdb::Exception::ToJSON…) [0x…]
…/_duckdb.…so(duckdb::InternalException::InternalException…) [0x…]
…/_duckdb.…so(duckdb::ConstantVector::VerifyVectorType<duckdb::string_t>…) [0x…]
…/_duckdb.…so(+0xecfd2d) [0x…]
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x101e1b6)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x1023425)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x1023669)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x5e54d3)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x601cff)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x60275f)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0x5f815e)
~/.duckdb/extensions/v1.4.4/linux_amd64/iceberg.duckdb_extension(+0xc5c172)
…/_duckdb.…so(duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::TryInitializeNextBatch…) [0x…]
…/_duckdb.…so(duckdb::MultiFileFunction<duckdb::ParquetMultiFileInfo>::MultiFileInitLocal…) [0x…]
…/_duckdb.…so(duckdb::PhysicalTableScan::GetLocalSourceState…) [0x…]
…/_duckdb.…so(duckdb::PipelineExecutor::PipelineExecutor…) [0x…]
…/_duckdb.…so(duckdb::PipelineTask::ExecuteTask…) [0x…]
…/_duckdb.…so(duckdb::ExecutorTask::Execute…) [0x…]
…/_duckdb.…so(duckdb::TaskScheduler::ExecuteForever…) [0x…]
```

### OS:

Linux 5.14, x86_64

### DuckDB Version:

1.4.4

### DuckDB Client:

Python

### Hardware:

_No response_

### Full Name:

Hon Ming Chan

### Affiliation:

Maven Securities

### Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

### Did you include all code required to reproduce the issue?

- [x] Yes, I have

### Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

- [x] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`LOWER(varchar_col) = literal` filter pushdown crashes parquet scan with `Expected vector of type VARCHAR, but found vector of type INT32` and permanently invalidates the connection #970

What happens?

To Reproduce

Versions

Full stack trace

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LOWER(varchar_col) = literal filter pushdown crashes parquet scan with Expected vector of type VARCHAR, but found vector of type INT32 and permanently invalidates the connection #970

Description

What happens?

To Reproduce

Versions

Full stack trace

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`LOWER(varchar_col) = literal` filter pushdown crashes parquet scan with `Expected vector of type VARCHAR, but found vector of type INT32` and permanently invalidates the connection #970