Skip to content

optimize arrow batch scan#264

Open
aheev wants to merge 3 commits intoLadybugDB:masterfrom
aheev:dev
Open

optimize arrow batch scan#264
aheev wants to merge 3 commits intoLadybugDB:masterfrom
aheev:dev

Conversation

@aheev
Copy link
Contributor

@aheev aheev commented Mar 3, 2026

  • improve applySemiMaskFilter perf by using lower_bound instead of linear search
  • populate outputToArrowColumnIdx only once table scan

@aheev
Copy link
Contributor Author

aheev commented Mar 3, 2026

@adsharma could you PTAL?

Copy link
Contributor

@adsharma adsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall: Good improvement! I'd like to see some benchmark on what got faster

Also explain why/how setToTable() made things more efficient:

Example: one-time “bind scan projection to Arrow layout” instead of doing the mapping repeatedly.

@aheev
Copy link
Contributor Author

aheev commented Mar 4, 2026

overall: Good improvement! I'd like to see some benchmark on what got faster

Also explain why/how setToTable() made things more efficient:

Example: one-time “bind scan projection to Arrow layout” instead of doing the mapping repeatedly.

There's shouldn't be significant improvement. It's a short(#columns are usually small) for-loop. I just happened to notice redundant re-population of outputToArrowColumnIdx every morsel scan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants