Describe the bug
The PushDownFilter logical optimizer rule has ApplyOrder::TopDown and it claims to preserve filter oder:
"use IndexSet to remove dupes while preserving predicate order" but this code is actually reversing the order of multiple filter plan nodes (as opposed to a single filter with a conjunction expression):
let parents_predicates = split_conjunction_owned(filter.predicate);
// remove duplicated filters
let child_predicates = split_conjunction_owned(child_filter.predicate);
let new_predicates = parents_predicates
.into_iter()
.chain(child_predicates)
// use IndexSet to remove dupes while preserving predicate order
.collect::<IndexSet<_>>()
.into_iter()
.collect::<Vec<_>>();
Note that the execution order of the unoptimized plan would be - child filters first (as inner nodes are the input of outer nodes) and then parent filters. But the optimized filter predicates are parent filters first, then child filters.
To Reproduce
If I make a small test change:
diff --git a/datafusion/optimizer/src/push_down_filter.rs b/datafusion/optimizer/src/push_down_filter.rs
index a1a636cfef..1f8dbc91ce 100644
--- a/datafusion/optimizer/src/push_down_filter.rs
+++ b/datafusion/optimizer/src/push_down_filter.rs
@@ -3239,7 +3239,8 @@ mod tests {
vec![col("a").eq(lit(10i64)), col("b").gt(lit(11i64))],
Some(vec![0]),
)?
- .filter(and(col("a").eq(lit(10i64)), col("b").gt(lit(11i64))))?
+ .filter(col("a").eq(lit(10i64)))?
+ .filter(col("b").gt(lit(11i64)))?
.project(vec![col("a"), col("b")])?
.build()?;
the predicates are reversed:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: multi_combined_filter
Source: datafusion/optimizer/src/push_down_filter.rs:3247
────────────────────────────────────────────────────────────────────────────────
Expression: optimized_plan
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
1 1 │ Projection: a, b
2 │- Filter: a = Int64(10) AND b > Int64(11)
2 │+ Filter: b > Int64(11) AND a = Int64(10)
3 3 │ TableScan: test projection=[a], partial_filters=[a = Int64(10), b > Int64(11)]
────────────┴───────────────────────────────────────────────────────────────────
Expected behavior
I expect PushDownFilter to preserve the execution order of predicates as in the unoptimized plan.
Additional context
In environments where we don't have information about the filter selectivity, the user might have domain knowledge and put the more selective filters first, expecting a boost in query performance. But counterintuitively, using two filters instead of one filter with a conjunction, reverses the execution order.
Describe the bug
The
PushDownFilterlogical optimizer rule hasApplyOrder::TopDownand it claims to preserve filter oder:"use IndexSet to remove dupes while preserving predicate order" but this code is actually reversing the order of multiple filter plan nodes (as opposed to a single filter with a conjunction expression):
Note that the execution order of the unoptimized plan would be - child filters first (as inner nodes are the input of outer nodes) and then parent filters. But the optimized filter predicates are parent filters first, then child filters.
To Reproduce
If I make a small test change:
the predicates are reversed:
Expected behavior
I expect
PushDownFilterto preserve the execution order of predicates as in the unoptimized plan.Additional context
In environments where we don't have information about the filter selectivity, the user might have domain knowledge and put the more selective filters first, expecting a boost in query performance. But counterintuitively, using two filters instead of one filter with a conjunction, reverses the execution order.