Skip to content

Add segmented merge algorithm with tests#7126

Open
abhishek593 wants to merge 3 commits intoTheHPXProject:masterfrom
abhishek593:distributed-merge
Open

Add segmented merge algorithm with tests#7126
abhishek593 wants to merge 3 commits intoTheHPXProject:masterfrom
abhishek593:distributed-merge

Conversation

@abhishek593
Copy link
Copy Markdown
Contributor

Proposed Changes

This PR introduces segmented hpx::merge, allowing stable merging of two sorted distributed ranges across multiple localities. The algorithm employs the co-rank technique: a binary search that identifies what sub-ranges of A and B would be in the current destination slice.

Future Refinement

The current implementation resides in a single detail/merge.hpp header. Once I start working on set_* algorithms, the shared infrastructure (slice decomposition, handle registry, collective context, co-rank, payload batching, etc.) will be organised into reusable headers. As of now, the exact API boundaries are not clear to me, so premature splitting would not be appropriate.

Signed-off-by: Abhishek Bansal <abhibansal593@gmail.com>
@abhishek593 abhishek593 requested a review from hkaiser as a code owner March 29, 2026 12:24
@StellarBot
Copy link
Copy Markdown

Can one of the admins verify this patch?

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 29, 2026

@kollanur This may be interesting for you. Please have a look.

Copy link
Copy Markdown
Contributor

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of superficial comments so far. I will need to invest more time in understanding the proposed solution in detail.

One thing that puzzled me right away: why didn't you use the existing (local) hpx::merge, possibly its parallelized variant? We have spent a significant amount of time to optimize that.

Signed-off-by: Abhishek Bansal <abhibansal593@gmail.com>
@abhishek593
Copy link
Copy Markdown
Contributor Author

@hkaiser The reason I didn't use HPX's parallel merge is that it can only be used in Phase 6 (We have to do rest of the phases for the distributed case), and even then, we would have to first copy the received fragments into a buffer, then launch the parallel merge. This would defeat any optimization gain that we may have from parallel merge. Currently, I am just doing a standard merge directly from fragments.

But this distributed merge implementation uses the same co_rank idea (Phase 2). There, it's called diagonal_intersection, but it's based on the same binary-search idea.

Signed-off-by: Abhishek Bansal <abhibansal593@gmail.com>
@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 4, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 5, 2026

@hkaiser The reason I didn't use HPX's parallel merge is that it can only be used in Phase 6 (We have to do rest of the phases for the distributed case), and even then, we would have to first copy the received fragments into a buffer, then launch the parallel merge. This would defeat any optimization gain that we may have from parallel merge. Currently, I am just doing a standard merge directly from fragments.

But this distributed merge implementation uses the same co_rank idea (Phase 2). There, it's called diagonal_intersection, but it's based on the same binary-search idea.

If you can call a sequential merge you should be able to invoke a parallel one as well. At least use hpx::merge(seq) for the fragments. Even our sequential merge was heavility optimized beyond a simple loop.

@abhishek593
Copy link
Copy Markdown
Contributor Author

@hkaiser Sorry for any misunderstanding. But, I am not saying that we can't call parallel merge. We can definitely call it, but we would have to convert fragments which is of type vector<resolved_fragment>, to a vector first, or something that has an iterator we can pass. Just to do this operation, we would have time complexity O(N). This would eliminate any gains that we may have from parallel merge, and infact would be worse.

For parallel merge, they didn't have this pre-step of converting to a vector first to get the iterator. If I am missing something completely obvious, please let me know, since I am not seeing any way to achieve this. I can work on a benchmark in the next couple of days.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 5, 2026

@hkaiser Sorry for any misunderstanding. But, I am not saying that we can't call parallel merge. We can definitely call it, but we would have to convert fragments which is of type vector<resolved_fragment>, to a vector first, or something that has an iterator we can pass. Just to do this operation, we would have time complexity O(N). This would eliminate any gains that we may have from parallel merge, and infact would be worse.

For parallel merge, they didn't have this pre-step of converting to a vector first to get the iterator. If I am missing something completely obvious, please let me know, since I am not seeing any way to achieve this. I can work on a benchmark in the next couple of days.

I will have to have a closer look at your code to understand this. This is a big PR, please be patient with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants