Changed reduce implementation and added tests#6877
Changed reduce implementation and added tests#6877BhoomishGupta wants to merge 16 commits intoTheHPXProject:masterfrom
Conversation
|
Can one of the admins verify this patch? |
hkaiser
left a comment
There was a problem hiding this comment.
Could you please separate the formatting changes into a different PR? It's close to impossible to review your actual changes here.
Also, for the formatting - are you sure you're using clang-format V20?
|
Yeah, I was not using V20. |
Simply keep (force) pushing to your branch, this wil update the PR. |
22a1f44 to
25bccb7
Compare
3e0fbc4 to
4c3a5d6
Compare
|
@BhoomishGupta FWIW, the reduce tests now seem to fail. Also, please rebase thise onto master to resolve the conflicts. |
3a9e8c3 to
c162a8e
Compare
Working on it |
c162a8e to
e232c91
Compare
|
@BhoomishGupta Please rebase your changes onto master to reduce the amount of unrelated changes (currently more than 300 files). Otherwise it's close to impossible to review your PR. |
8cdfc5f to
4595683
Compare
There was a problem hiding this comment.
@BhoomishGupta Thanks for investigating and attempting to implement a new customization point for the execution parameters.
You have implemented part of the necessary logic. Please have a look at the existing customization points to see what's necessary additionally.
I have the impression, that you have added the new functionality for all algorithms, as the new functionality will be invoked always. I'd suggest making sure, that the new functionality is used only for the reduction algorithms.
@hkaiser Thanks for the review. I will check the existing customization points and add the missing things. Regarding the scope: I did intentionally add it to the generic utility because I thought adjusting chunk sizes could be beneficial for other algorithms down the line. Or do you strongly prefer I hard-restrict it to reduce only? Just to clarify on the current behavior: even though it's in a shared file, it currently acts as a no-op for other algorithms. The default implementation returns {0, 0}, and the call sites only apply adjustments when the return values are non-zero. So it shouldn't be impacting non-reduce algorithms right now, but I will make sure the trait-detection aligns perfectly with the rest of the codebase. |
The default behavior is needed for all algorithms (as a fall-back, most likely), while the customization should apply to the reduction algorithms only.
If the default used by all algorithms (except reduce) is now |
I don,t think that the {0, 0} default changes the current behavior for any algorithm. The call site in auto [adj_chunk, adj_max] =
hpx::execution::experimental::adjust_chunk_size_and_max_chunks(
policy.parameters(), policy.executor(),
max_chunks, chunk_size);
if (adj_chunk != 0)
chunk_size = adj_chunk;
if (adj_max != 0)
max_chunks = adj_max;
else if (adj_chunk == 0)
adjust_chunk_size_and_max_chunks(
cores, count, max_chunks, chunk_size);works when the Customization Point returns the default {0, 0}, both adj_chunk and adj_max are zero, so we go through else if (adj_chunk == 0) which calls the existing adjust_chunk_size_and_max_chunks free function, which is exactly the same code path as before. |
1a239ce to
94c773c
Compare
I have seen that. This code does something different from what was in place before. The default just should be the previous As said, please have a look at the other parameters customizations to see what's missing, e.g., |
There was a problem hiding this comment.
There is one thing missing still. Currently you use policy.with() to replace the current set of parameter customizations, which will cause for any user-supplied customizations to be lost. I think we need to use something like the new API rebind_executor_parameters (like for instance proposed here: #6935) instead.
Otherwise: very nice! Thanks for working on this!
Let's wait for #6935 to be merged, then you can adapt your PR. |
|
@BhoomishGupta FWIW, #6935 was merged yesterday. Could you please rebase this PR onto master to resolve the conflicts? After that let's work on finalizing your patch. |
Hi @hkaiser, Thanks for the heads-up on #6935 getting merged! Just a quick note: my university exams are currently going on, so my availability is a bit limited right now. I will rebase this PR onto master to resolve the conflicts and work on finalizing the patch right after my exams finish on March 14th. Thanks for your patience! |
4e6a077 to
f6ac93d
Compare
Up to standards ✅🟢 Issues
|
|
@BhoomishGupta On MacOS the new tests are failing. Inspect reports an issue as well. Please pay attention to the CIs yourself. |
Ah, my bad, I had my wires crossed and was looking at the CI for a different PR. I will check the MacOS and Inspect logs for this one tomorrow and will push a fix soon. I'll pay closer attention to the CI status here going forward. |
|
@BhoomishGupta ping? |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes reduce partition initialization for cases where value_type is not convertible to T (issue #6647) by ensuring partitions have at least two elements and by adding a regression test. Also introduces an adjust_chunk_size_and_max_chunks executor-parameters customization point with dispatch/rebind support and corresponding unit tests.
Changes:
- Fix parallel
reducepartition reduction to avoid assuming*firstconverts toT, and add a regression test (reduce_6647). - Add
adjust_chunk_size_and_max_chunkscustomization point + rebind/wrapping support, and update chunking code to use it. - Extend unit tests around executor-parameters dispatch/rebinding (including dispatch order guarantees).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/core/execution/include/hpx/execution/executors/rebind_executor_parameters.hpp | Adds support for rebinding/wrapping adjust_chunk_size_and_max_chunks and fixes mark_begin_execution double-dispatch. |
| libs/core/execution/include/hpx/execution/executors/execution_parameters_fwd.hpp | Declares the adjust_chunk_size_and_max_chunks CPO. |
| libs/core/execution/include/hpx/execution/executors/execution_parameters.hpp | Implements default adjustment logic + property plumbing for adjust_chunk_size_and_max_chunks. |
| libs/core/execution/tests/unit/rebind_executor_parameters.cpp | Adds rebind tests for adjust_chunk_size_and_max_chunks and dispatch order test for mark_begin_execution. |
| libs/core/execution/tests/unit/executor_parameters_dispatching.cpp | Adds dispatch precedence tests for adjust_chunk_size_and_max_chunks between parameters and executor. |
| libs/core/algorithms/include/hpx/parallel/util/detail/chunk_size.hpp | Switches chunk-size/max-chunks consistency logic to the new CPO. |
| libs/core/algorithms/include/hpx/parallel/algorithms/reduce.hpp | Fixes parallel reduce partition reduction by enforcing partition sizes >= 2 and adding a helper that doesn’t require *first convertible to T. |
| libs/core/algorithms/tests/regressions/reduce_6647.cpp | Adds regression test covering the reported reduce bug case. |
| libs/core/algorithms/tests/regressions/CMakeLists.txt | Registers reduce_6647 regression test. |
| .circleci/tests.unit1.container_algorithms | Adds a test entry to CI list. |
| components/parcel_plugins/coalescing/src/coalescing_message_handler.cpp | Formatting-only change (line wrap). |
| components/containers/partitioned_vector/tests/unit/is_iterator_partitioned_vector.cpp | Formatting-only change (include order). |
| hpx::chrono::null_duration, cores, count); | ||
|
|
||
| // make sure, chunk size and max_chunks are consistent | ||
| adjust_chunk_size_and_max_chunks( | ||
| cores, count, max_chunks, chunk_size, true); | ||
| std::tie(chunk_size, max_chunks) = | ||
| hpx::execution::experimental::adjust_chunk_size_and_max_chunks( | ||
| policy.parameters(), policy.executor(), count, cores, | ||
| max_chunks, chunk_size); |
There was a problem hiding this comment.
This replaces the old adjust_chunk_size_and_max_chunks(..., has_variable_chunk_size=true) call sites with the new CPO, but the new call no longer preserves the has_variable_chunk_size behavior (old behavior was to assert chunk_size != 0 and skip further adjustments). This is a semantic change that can break algorithms relying on variable chunk sizes. Consider extending the CPO/property to accept a has_variable_chunk_size flag (or a dedicated customization point/type tag), or keep using the legacy helper in the variable-chunk-size paths. (Same issue applies to the other former ..., true) call sites in this file.)
|
|
||
| // Handle single-element case: can't partition into size >= 2 | ||
| // This must be checked for all execution policies | ||
| auto const count = hpx::parallel::detail::distance(first, last); |
There was a problem hiding this comment.
The range length is computed twice (distance(first, last) at line 502 and again at line 533). For non-random-access iterators this is O(n) each time. Reuse the already computed count in the partitioner call to avoid redundant traversal.
| HPX_MOVE(reduce_policy), first, | ||
| hpx::parallel::detail::distance(first, last), HPX_MOVE(f1), |
There was a problem hiding this comment.
The range length is computed twice (distance(first, last) at line 502 and again at line 533). For non-random-access iterators this is O(n) each time. Reuse the already computed count in the partitioner call to avoid redundant traversal.
| HPX_MOVE(reduce_policy), first, | |
| hpx::parallel::detail::distance(first, last), HPX_MOVE(f1), | |
| HPX_MOVE(reduce_policy), first, count, HPX_MOVE(f1), |
| struct reduce_executor_parameters | ||
| { | ||
| template <typename Executor> | ||
| HPX_FORCEINLINE constexpr std::pair<std::size_t, std::size_t> | ||
| adjust_chunk_size_and_max_chunks(Executor&&, | ||
| std::size_t num_elements, std::size_t num_cores, | ||
| std::size_t /*max_chunks*/, | ||
| std::size_t chunk_size) const noexcept | ||
| { |
There was a problem hiding this comment.
As a rebound (wrapping) parameters object, reduce_executor_parameters currently ignores the incoming max_chunks entirely (commented out) and returns a freshly computed new_max_chunks. This can effectively bypass user-provided parameters like max_num_chunks() on the wrapped policy. A more composable approach is to implement a wrapping customization (i.e., an overload that takes InnerParams&& inner) which first calls adjust_chunk_size_and_max_chunks(inner, exec, ...) and then adjusts the returned {chunk_size, max_chunks} minimally to enforce the partition-size constraint (>=2 and no remainder partition of size 1). This keeps existing parameter constraints intact while still fixing #6647.
| if (max_chunks == 0) | ||
| { | ||
| if (chunk_size == 0) | ||
| { | ||
| std::size_t const cores_times_4 = 4 * num_cores; // -V112 | ||
|
|
||
| chunk_size = | ||
| (num_elements + cores_times_4 - 1) / cores_times_4; | ||
|
|
||
| max_chunks = | ||
| (std::min) (cores_times_4, num_elements); // -V112 | ||
|
|
||
| chunk_size = (std::max) (chunk_size, | ||
| (num_elements + max_chunks - 1) / max_chunks); | ||
| } | ||
| else | ||
| { | ||
| // max_chunks == 0 && chunk_size != 0 | ||
| max_chunks = (num_elements + chunk_size - 1) / chunk_size; | ||
| } | ||
| } | ||
| else if (chunk_size == 0) | ||
| { | ||
| chunk_size = (num_elements + max_chunks - 1) / max_chunks; | ||
| } | ||
| else | ||
| { | ||
| // max_chunks != 0 && chunk_size != 0 | ||
| std::size_t const calculated_max_chunks = | ||
| (num_elements + chunk_size - 1) / chunk_size; | ||
|
|
||
| if (calculated_max_chunks > max_chunks) | ||
| { | ||
| chunk_size = (num_elements + max_chunks - 1) / max_chunks; | ||
| } | ||
| } | ||
|
|
||
| return {chunk_size, max_chunks}; |
There was a problem hiding this comment.
The default chunk adjustment logic is effectively duplicated here (it closely mirrors the existing hpx::parallel::util::detail::adjust_chunk_size_and_max_chunks(...) logic shown in context). Maintaining two independent implementations risks divergence over time. Consider refactoring the default property implementation to delegate to the existing helper (or moving the shared logic into a single common internal function) so there’s one source of truth.
| if (max_chunks == 0) | |
| { | |
| if (chunk_size == 0) | |
| { | |
| std::size_t const cores_times_4 = 4 * num_cores; // -V112 | |
| chunk_size = | |
| (num_elements + cores_times_4 - 1) / cores_times_4; | |
| max_chunks = | |
| (std::min) (cores_times_4, num_elements); // -V112 | |
| chunk_size = (std::max) (chunk_size, | |
| (num_elements + max_chunks - 1) / max_chunks); | |
| } | |
| else | |
| { | |
| // max_chunks == 0 && chunk_size != 0 | |
| max_chunks = (num_elements + chunk_size - 1) / chunk_size; | |
| } | |
| } | |
| else if (chunk_size == 0) | |
| { | |
| chunk_size = (num_elements + max_chunks - 1) / max_chunks; | |
| } | |
| else | |
| { | |
| // max_chunks != 0 && chunk_size != 0 | |
| std::size_t const calculated_max_chunks = | |
| (num_elements + chunk_size - 1) / chunk_size; | |
| if (calculated_max_chunks > max_chunks) | |
| { | |
| chunk_size = (num_elements + max_chunks - 1) / max_chunks; | |
| } | |
| } | |
| return {chunk_size, max_chunks}; | |
| return hpx::parallel::util::detail::adjust_chunk_size_and_max_chunks( | |
| num_elements, num_cores, max_chunks, chunk_size); |
| auto rebound_params = | ||
| hpx::execution::experimental::rebind_executor_parameters( | ||
| policy.parameters(), reduce_executor_parameters{}); |
There was a problem hiding this comment.
The PR description says changes were limited to a few algorithm files/tests, but this PR also introduces a new executor-parameters customization point (adjust_chunk_size_and_max_chunks) plus rebind/dispatch wiring and adds/updates several execution unit tests and CI test lists. Please update the PR description (or narrow the PR scope) so reviewers can accurately assess the broader API and behavior changes included here.
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
…max_chunks functionality Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
dfe3334 to
4ad03fb
Compare
Hi @hkaiser, sorry for the delay, I had academic tests. I have pushed updates and resolved the conflicts; please take another look. I will also take a look into the snippets copilot reviewed. |
|
@BhoomishGupta please also have a look at the failing test: rebind_executor_parameters. |
Fixes #6647
Proposed Changes
Created a helper function that reduces the partition value without the need of initial value..
Added a test case that was mentioned in the issue #6647.
Corrected a typo in documentation.
I only did changes in :
And created this :
Any background context you want to provide?
Previously, the implementation assumed that the *first was convertible to T. This assumption does not hold for the actual function logic required here. This PR removes that dependency, ensuring the type handling is correct.
Checklist
Not all points below apply to all pull requests.