Skip to content

fix(optimizer): preserve filter execution order in PushDownFilter#21646

Open
SAY-5 wants to merge 1 commit intoapache:mainfrom
SAY-5:fix-push-down-filter-preserve-order
Open

fix(optimizer): preserve filter execution order in PushDownFilter#21646
SAY-5 wants to merge 1 commit intoapache:mainfrom
SAY-5:fix-push-down-filter-preserve-order

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented Apr 15, 2026

Closes #21642.

When PushDownFilter collapses a parent Filter into a child Filter, the combined predicate list was built as parents_predicates.chain(child_predicates):

LogicalPlan::Filter(child_filter) => {
    let parents_predicates = split_conjunction_owned(filter.predicate);
    let child_predicates = split_conjunction_owned(child_filter.predicate);
    let new_predicates = parents_predicates
        .into_iter()
        .chain(child_predicates)
        // use IndexSet to remove dupes while preserving predicate order
        .collect::<IndexSet<_>>()
        ...

Because the unoptimized plan evaluates inner (child) filters before outer (parent) filters, this reversed the user-authored execution order of selective predicates. The reproducer in the issue:

LogicalPlanBuilder::from(scan)
    .filter(col("a").eq(lit(10)))?    // applied first (inner)
    .filter(col("b").gt(lit(11)))?    // applied second (outer)

…was optimized into Filter: b > 11 AND a = 10 rather than the expected Filter: a = 10 AND b > 11. In environments without filter-selectivity statistics this matters: users who place a cheap or highly-selective filter first expect it to execute first.

Fix

Build the conjunction as child_predicates.chain(parents_predicates) so PushDownFilter actually preserves the execution order it claims to in the existing use IndexSet to remove dupes while preserving predicate order comment.

Snapshot update

two_filters_on_same_depth had its assert_optimized_plan_equal! snapshot frozen at the buggy order. The unoptimized plan in that test applies a <= 1 first (inner filter) and a >= 1 second (outer), so the optimized result must be a <= 1 AND a >= 1. Updated the snapshot and added an inline comment documenting the invariant.

Verification

cargo test -p datafusion-optimizer --lib push_down_filter passes 79/79 tests with the fix and the snapshot update applied.

When PushDownFilter collapses a parent Filter into a child Filter, the
combined predicate list was built as parent_predicates.chain(child_predicates).
Because the unoptimized plan evaluates inner (child) filters before
outer (parent) filters, this reversed the user-authored execution order
of selective predicates: a query written as

    LogicalPlanBuilder::from(scan)
        .filter(col("a").eq(lit(10)))?    // applied first (inner)
        .filter(col("b").gt(lit(11)))?    // applied second (outer)

was optimized into 'Filter: b > 11 AND a = 10' rather than the
expected 'Filter: a = 10 AND b > 11'. In environments without
filter-selectivity statistics this matters: users who put their cheap
or highly-selective filter first expect it to execute first.

Build the conjunction as child_predicates.chain(parents_predicates)
so PushDownFilter actually preserves the execution order it claims to.
Update the two_filters_on_same_depth snapshot to assert the corrected
ordering and document why it matters in a short comment.

Closes apache#21642.
@github-actions github-actions bot added the optimizer Optimizer rules label Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PushDownFilter claims to be preserving predicate order but in fact reverses it

1 participant