Skip to content

Refactor InListExpr to use modular StaticFilter architecture#21649

Open
geoffreyclaude wants to merge 1 commit intoapache:mainfrom
geoffreyclaude:perf/in_list_static_filter
Open

Refactor InListExpr to use modular StaticFilter architecture#21649
geoffreyclaude wants to merge 1 commit intoapache:mainfrom
geoffreyclaude:perf/in_list_static_filter

Conversation

@geoffreyclaude
Copy link
Copy Markdown
Contributor

@geoffreyclaude geoffreyclaude commented Apr 15, 2026

Which issue does this PR close?

Rationale for this change

Today, InListExpr mixes several different concerns in one implementation:

  • expression planning and evaluation
  • building the membership-test data structure for constant IN lists
  • selecting which internal filter path to use
  • constructing the final boolean result, including null semantics

That makes the code harder to follow and harder to extend when introducing new IN LIST execution strategies.

This PR refactors the internals around a StaticFilter abstraction for constant IN lists. The goal is not to change behavior yet, but to separate the existing responsibilities so follow-up optimizations can be implemented and reviewed in smaller, more self-contained pieces.

This PR is therefore primarily architectural: it preserves the current public behavior while making the internal strategy boundaries explicit.

What changes are included in this PR?

  • Refactors datafusion/physical-expr/src/expressions/in_list.rs to delegate filter-specific work into a dedicated in_list/ module
  • Introduces static_filter.rs, which defines the internal StaticFilter trait used for constant IN lists
  • Moves the existing primitive-list static filtering logic into primitive_filter.rs
  • Moves nested / complex-type fallback handling into nested_filter.rs
  • Extracts result assembly into result.rs
  • Extracts filter selection into strategy.rs
  • Keeps the external InListExpr API unchanged while preserving the existing constant-list fast path behavior

The practical effect of this PR is that the current implementation becomes easier to reason about: InListExpr remains the entry point, while filter construction, filter selection, and result materialization are handled by smaller focused units.

Are these changes tested?

Yes. I validated this PR with:

  • cargo fmt --all
  • cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings
  • cargo test -p datafusion-physical-expr in_list --lib

Are there any user-facing changes?

No user-facing API changes. This is an internal refactor that prepares the codebase for subsequent IN LIST performance improvements.

Introduces the StaticFilter trait to decouple membership testing from InListExpr. Migrates existing HashSet optimizations into primitive_filter.rs to maintain performance parity while enabling future specialized implementations. Triggers for all constant IN lists.

(cherry picked from commit 797b7fc)
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant