test: Add test for computing min/max values for expresions#21681
test: Add test for computing min/max values for expresions#21681alamb wants to merge 1 commit intoapache:mainfrom
Conversation
| STORED AS PARQUET | ||
| LOCATION '../core/tests/data/clickbench_hits_10.parquet'; | ||
|
|
||
| query II |
There was a problem hiding this comment.
This is a test for a a query that is optimized to used statistics in https://github.com/apache/datafusion/pull/21651/changes
On main, DataFusion computes delta from the actual rows and then takes the real min/max,
On #21651, the new logic propagates exact column min/max through the projection using interval arithmetic, then aggregate_statistics treats those derived bounds as exact aggregate answers and replaces the whole aggregate with literals.
For UserID - ClientIP, the interval formed from independent column extrema is wider than the set of values that actually occur in the data, because the min UserID is not paired with the max ClientIP in the same row, and similarly for the max side. The optimizer therefore folds to unattainable values:
- wrong folded min: -2461439047704734435
- wrong folded max: 7418527521343057109
…essions Interval arithmetic treats each column reference as an independent value, so applying evaluate_bounds to multi-column expressions (e.g. `a - b`) or to an expression with the same column referenced twice (e.g. `col * col`) produces intervals that are not valid min/max of the expression over the actual data. Folding MIN/MAX aggregates from those intervals — as the aggregate statistics optimizer does for Single-mode aggregates — returns incorrect results. Only propagate bounds when the expression contains at most one Column node in its tree. Also simplify the propagation helper now that exactness is determined by the single column's stats. Adds unit and sqllogictest coverage (the multi-column test from apache#21681 plus a same-column-twice variant). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Incorporated this now into #21651 |
|
Thanks @alamb |
Which issue does this PR close?
Rationale for this change
This test shows a bug in #21651
The test passes on main but fails on that PR like this
What changes are included in this PR?
Add a new test
Are these changes tested?
They are all tests
Are there any user-facing changes?
No