Skip to content

test: Add test for computing min/max values for expresions#21681

Closed
alamb wants to merge 1 commit intoapache:mainfrom
alamb:test-aggregate-stats-regression
Closed

test: Add test for computing min/max values for expresions#21681
alamb wants to merge 1 commit intoapache:mainfrom
alamb:test-aggregate-stats-regression

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented Apr 16, 2026

Which issue does this PR close?

Rationale for this change

This test shows a bug in #21651

The test passes on main but fails on that PR like this

andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion3$ cargo test --profile=ci --test sqllogictests
    Finished `ci` profile [unoptimized] target(s) in 0.22s
     Running bin/sqllogictests.rs (target/ci/deps/sqllogictests-d8b84848b1385364)
Running with 16 test threads (available parallelism: 16)
Completed 428 test files in 4 seconds                                                                                                       External error: 1 errors in file /Users/andrewlamb/Software/datafusion3/datafusion/sqllogictest/test_files/aggregate.slt

1. query result mismatch:
[SQL] SELECT MIN(delta), MAX(delta)
FROM (
  SELECT "UserID" - CAST("ClientIP" AS BIGINT) AS delta
  FROM hits_raw
);
[Diff] (-expected|+actual)
-   -2461439044872611287 7418527518698834918
+   -2461439047704734435 7418527521343057109
at /Users/andrewlamb/Software/datafusion3/datafusion/sqllogictest/test_files/aggregate.slt:8951



Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`

Caused by:
  process didn't exit successfully: `/Users/andrewlamb/Software/datafusion3/target/ci/deps/sqllogictests-d8b84848b1385364` (exit status: 1)

What changes are included in this PR?

Add a new test

Are these changes tested?

They are all tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 16, 2026
STORED AS PARQUET
LOCATION '../core/tests/data/clickbench_hits_10.parquet';

query II
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test for a a query that is optimized to used statistics in https://github.com/apache/datafusion/pull/21651/changes

On main, DataFusion computes delta from the actual rows and then takes the real min/max,

On #21651, the new logic propagates exact column min/max through the projection using interval arithmetic, then aggregate_statistics treats those derived bounds as exact aggregate answers and replaces the whole aggregate with literals.

For UserID - ClientIP, the interval formed from independent column extrema is wider than the set of values that actually occur in the data, because the min UserID is not paired with the max ClientIP in the same row, and similarly for the max side. The optimizer therefore folds to unattainable values:

  • wrong folded min: -2461439047704734435
  • wrong folded max: 7418527521343057109

Dandandan added a commit to Dandandan/arrow-datafusion that referenced this pull request Apr 17, 2026
…essions

Interval arithmetic treats each column reference as an independent value,
so applying evaluate_bounds to multi-column expressions (e.g. `a - b`) or
to an expression with the same column referenced twice (e.g. `col * col`)
produces intervals that are not valid min/max of the expression over the
actual data. Folding MIN/MAX aggregates from those intervals — as the
aggregate statistics optimizer does for Single-mode aggregates — returns
incorrect results.

Only propagate bounds when the expression contains at most one Column
node in its tree. Also simplify the propagation helper now that exactness
is determined by the single column's stats.

Adds unit and sqllogictest coverage (the multi-column test from apache#21681
plus a same-column-twice variant).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor

Incorporated this now into #21651

@Dandandan
Copy link
Copy Markdown
Contributor

Thanks @alamb

@Dandandan Dandandan closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants