Support arithmetic expressions in PruningPredicate for Parquet row gr…#21647
Support arithmetic expressions in PruningPredicate for Parquet row gr…#21647SubhamSinghal wants to merge 7 commits intoapache:mainfrom
Conversation
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing arithmetic-expr-pruning (51d5173) to f99ba69 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing arithmetic-expr-pruning (51d5173) to f99ba69 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing arithmetic-expr-pruning (51d5173) to f99ba69 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change '
PruningPredicatecurrently cannot prune Parquet row groups for predicates with arithmetic expressions likecol + 5 > 10ordate_col + INTERVAL '30 days' > '2024-01-01'. Therewrite_expr_to_prunablefunction only handles plain columns, CAST, TRY_CAST, negation, and NOT — arithmeticBinaryExprfalls through to "can't prune", meaning every row group is scanned.This is especially impactful for date/timestamp arithmetic in WHERE clauses (
WHERE order_date + INTERVAL '30 days' > CURRENT_DATE), which is very common in analytics queries on Parquet tables.What changes are included in this PR?
Added support for arithmetic expressions (
+,-) inrewrite_expr_to_prunable. The approach is "evaluate on min/max" — the arithmetic expression is passed through as thecolumn_expr, and the existingrewrite_column_exprmachinery substitutescol→col_max/col_mininside the arithmetic, producing predicates like(col_max + 5) > 10.Are these changes tested?
Yes, with UT
Are there any user-facing changes?
No