We could initialize the TopK statistics from column stats (at least for single columns) and make the initial threshold much tighter based on min/max statistics (at file / rowgroup/page level):
- We have a file/rowgroup with more than K (from TopK) amount of rows
- We have a single sort column (directly after scan)
- We can initialize/update the TopK using max (or min) statistics
- Also, if the new bound is smaller / bigger than the current TopK, we could update it to the tighter bound
This I think might help making initial threshold much tighter instead of having to read all the first row groups using not-initialized TopK.
Originally posted by @Dandandan in #21580 (comment)
We could initialize the TopK statistics from column stats (at least for single columns) and make the initial threshold much tighter based on min/max statistics (at file / rowgroup/page level):
This I think might help making initial threshold much tighter instead of having to read all the first row groups using not-initialized TopK.
Originally posted by @Dandandan in #21580 (comment)