Skip to content

Changed reduce implementation and added tests#6877

Open
BhoomishGupta wants to merge 16 commits intoTheHPXProject:masterfrom
BhoomishGupta:fix/reduce-6647
Open

Changed reduce implementation and added tests#6877
BhoomishGupta wants to merge 16 commits intoTheHPXProject:masterfrom
BhoomishGupta:fix/reduce-6647

Conversation

@BhoomishGupta
Copy link
Copy Markdown
Contributor

Fixes #6647

Proposed Changes

Created a helper function that reduces the partition value without the need of initial value..
Added a test case that was mentioned in the issue #6647.
Corrected a typo in documentation.

I only did changes in :

  1. hpx\libs\core\algorithms\include\hpx\parallel\algorithms\reduce.hpp
  2. hpx\libs\core\algorithms\tests\regressions\CMakeLists.txt
  3. the rest were changes due to clang and editor.config

And created this :

  1. hpx\libs\core\algorithms\tests\regressions\reduce_6647.cpp

Any background context you want to provide?

Previously, the implementation assumed that the *first was convertible to T. This assumption does not hold for the actual function logic required here. This PR removes that dependency, ensuring the type handling is correct.

Checklist

Not all points below apply to all pull requests.

  • I have added a new feature and have added tests to go along with it.
  • I have fixed a bug and have added a regression test.
  • I have added a test using random numbers; I have made sure it uses a seed, and that random numbers generated are valid inputs for the tests.

@StellarBot
Copy link
Copy Markdown

Can one of the admins verify this patch?

Copy link
Copy Markdown
Contributor

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please separate the formatting changes into a different PR? It's close to impossible to review your actual changes here.

Also, for the formatting - are you sure you're using clang-format V20?

@BhoomishGupta
Copy link
Copy Markdown
Contributor Author

Yeah, I was not using V20.
So, should I close this PR and open a new one, that is only with the changes?

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Feb 4, 2026

Yeah, I was not using V20. So, should I close this PR and open a new one, that is only with the changes?

Simply keep (force) pushing to your branch, this wil update the PR.

@BhoomishGupta BhoomishGupta force-pushed the fix/reduce-6647 branch 2 times, most recently from 22a1f44 to 25bccb7 Compare February 4, 2026 20:34
Comment thread libs/core/algorithms/include/hpx/parallel/algorithms/reduce.hpp Outdated
Comment thread libs/core/algorithms/include/hpx/parallel/algorithms/reduce.hpp Outdated
@BhoomishGupta BhoomishGupta force-pushed the fix/reduce-6647 branch 3 times, most recently from 3e0fbc4 to 4c3a5d6 Compare February 10, 2026 23:44
@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Feb 11, 2026

@BhoomishGupta FWIW, the reduce tests now seem to fail. Also, please rebase thise onto master to resolve the conflicts.

@BhoomishGupta BhoomishGupta force-pushed the fix/reduce-6647 branch 2 times, most recently from 3a9e8c3 to c162a8e Compare February 11, 2026 00:35
@BhoomishGupta
Copy link
Copy Markdown
Contributor Author

@BhoomishGupta FWIW, the reduce tests now seem to fail. Also, please rebase thise onto master to resolve the conflicts.

Working on it

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Feb 12, 2026

@BhoomishGupta Please rebase your changes onto master to reduce the amount of unrelated changes (currently more than 300 files). Otherwise it's close to impossible to review your PR.

@BhoomishGupta BhoomishGupta force-pushed the fix/reduce-6647 branch 2 times, most recently from 8cdfc5f to 4595683 Compare February 12, 2026 23:45
Copy link
Copy Markdown
Contributor

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BhoomishGupta Thanks for investigating and attempting to implement a new customization point for the execution parameters.

You have implemented part of the necessary logic. Please have a look at the existing customization points to see what's necessary additionally.

I have the impression, that you have added the new functionality for all algorithms, as the new functionality will be invoked always. I'd suggest making sure, that the new functionality is used only for the reduction algorithms.

@BhoomishGupta
Copy link
Copy Markdown
Contributor Author

BhoomishGupta commented Feb 17, 2026

@BhoomishGupta Thanks for investigating and attempting to implement a new customization point for the execution parameters.

You have implemented part of the necessary logic. Please have a look at the existing customization points to see what's necessary additionally.

I have the impression, that you have added the new functionality for all algorithms, as the new functionality will be invoked always. I'd suggest making sure, that the new functionality is used only for the reduction algorithms.

@hkaiser Thanks for the review. I will check the existing customization points and add the missing things.

Regarding the scope: I did intentionally add it to the generic utility because I thought adjusting chunk sizes could be beneficial for other algorithms down the line. Or do you strongly prefer I hard-restrict it to reduce only?

Just to clarify on the current behavior: even though it's in a shared file, it currently acts as a no-op for other algorithms. The default implementation returns {0, 0}, and the call sites only apply adjustments when the return values are non-zero. So it shouldn't be impacting non-reduce algorithms right now, but I will make sure the trait-detection aligns perfectly with the rest of the codebase.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Feb 17, 2026

@hkaiser Thanks for the review. I will check the existing customization points and add the missing things.

Regarding the scope: I did intentionally add it to the generic utility because I thought adjusting chunk sizes could be beneficial for other algorithms down the line. Or do you strongly prefer I hard-restrict it to reduce only?

The default behavior is needed for all algorithms (as a fall-back, most likely), while the customization should apply to the reduction algorithms only.

Just to clarify on the current behavior: even though it's in a shared file, it currently acts as a no-op for other algorithms. The default implementation returns {0, 0}, and the call sites only apply adjustments when the return values are non-zero. So it shouldn't be impacting non-reduce algorithms right now, but I will make sure the trait-detection aligns perfectly with the rest of the codebase.

If the default used by all algorithms (except reduce) is now {0, 0}, wouldn't that change the current behavior?

@BhoomishGupta
Copy link
Copy Markdown
Contributor Author

BhoomishGupta commented Feb 18, 2026

@hkaiser Thanks for the review. I will check the existing customization points and add the missing things.
Regarding the scope: I did intentionally add it to the generic utility because I thought adjusting chunk sizes could be beneficial for other algorithms down the line. Or do you strongly prefer I hard-restrict it to reduce only?

The default behavior is needed for all algorithms (as a fall-back, most likely), while the customization should apply to the reduction algorithms only.

Just to clarify on the current behavior: even though it's in a shared file, it currently acts as a no-op for other algorithms. The default implementation returns {0, 0}, and the call sites only apply adjustments when the return values are non-zero. So it shouldn't be impacting non-reduce algorithms right now, but I will make sure the trait-detection aligns perfectly with the rest of the codebase.

If the default used by all algorithms (except reduce) is now {0, 0}, wouldn't that change the current behavior?

I don,t think that the {0, 0} default changes the current behavior for any algorithm.

The call site in chunk_size.hpp

auto [adj_chunk, adj_max] =
    hpx::execution::experimental::adjust_chunk_size_and_max_chunks(
        policy.parameters(), policy.executor(),
        max_chunks, chunk_size);
if (adj_chunk != 0)
    chunk_size = adj_chunk;
if (adj_max != 0)
    max_chunks = adj_max;
else if (adj_chunk == 0)
    adjust_chunk_size_and_max_chunks(
        cores, count, max_chunks, chunk_size);

works when the Customization Point returns the default {0, 0}, both adj_chunk and adj_max are zero, so we go through else if (adj_chunk == 0) which calls the existing adjust_chunk_size_and_max_chunks free function, which is exactly the same code path as before.
Only an execution parameters object that provides a custom adjust_chunk_size_and_max_chunks member will return non-zero values and override the defaults.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Feb 18, 2026

I don,t think that the {0, 0} default changes the current behavior for any algorithm.

The call site in chunk_size.hpp

auto [adj_chunk, adj_max] =
    hpx::execution::experimental::adjust_chunk_size_and_max_chunks(
        policy.parameters(), policy.executor(),
        max_chunks, chunk_size);
if (adj_chunk != 0)
    chunk_size = adj_chunk;
if (adj_max != 0)
    max_chunks = adj_max;
else if (adj_chunk == 0)
    adjust_chunk_size_and_max_chunks(
        cores, count, max_chunks, chunk_size);

works when the Customization Point returns the default {0, 0}, both adj_chunk and adj_max are zero, so we go through else if (adj_chunk == 0) which calls the existing adjust_chunk_size_and_max_chunks free function, which is exactly the same code path as before. Only an execution parameters object that provides a custom adjust_chunk_size_and_max_chunks member will return non-zero values and override the defaults.

I have seen that. This code does something different from what was in place before. The default just should be the previous adjust_chunk_size_and_max_chunks (most likely to be invoked from the CPO fallback).

As said, please have a look at the other parameters customizations to see what's missing, e.g., get_chunk_size here: https://github.com/STEllAR-GROUP/hpx/blob/b514f4ab6274d21a716e3acaac08f825b9a93776/libs/core/execution/include/hpx/execution/executors/execution_parameters_fwd.hpp#L79-L127.

Copy link
Copy Markdown
Contributor

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one thing missing still. Currently you use policy.with() to replace the current set of parameter customizations, which will cause for any user-supplied customizations to be lost. I think we need to use something like the new API rebind_executor_parameters (like for instance proposed here: #6935) instead.

Otherwise: very nice! Thanks for working on this!

Comment thread libs/core/algorithms/include/hpx/parallel/util/detail/chunk_size.hpp Outdated
Comment thread libs/core/algorithms/include/hpx/parallel/util/detail/chunk_size.hpp Outdated
Comment thread libs/core/algorithms/include/hpx/parallel/util/detail/chunk_size.hpp Outdated
@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Feb 24, 2026

There is one thing missing still. Currently you use policy.with() to replace the current set of parameter customizations, which will cause for any user-supplied customizations to be lost. I think we need to use something like the new API rebind_executor_parameters (like for instance proposed here: #6935) instead.

Otherwise: very nice! Thanks for working on this!

Let's wait for #6935 to be merged, then you can adapt your PR.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Mar 9, 2026

@BhoomishGupta FWIW, #6935 was merged yesterday. Could you please rebase this PR onto master to resolve the conflicts? After that let's work on finalizing your patch.

@BhoomishGupta
Copy link
Copy Markdown
Contributor Author

@BhoomishGupta FWIW, #6935 was merged yesterday. Could you please rebase this PR onto master to resolve the conflicts? After that let's work on finalizing your patch.

Hi @hkaiser,

Thanks for the heads-up on #6935 getting merged!

Just a quick note: my university exams are currently going on, so my availability is a bit limited right now. I will rebase this PR onto master to resolve the conflicts and work on finalizing the patch right after my exams finish on March 14th.

Thanks for your patience!

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 3, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Comment thread libs/core/algorithms/include/hpx/parallel/algorithms/reduce.hpp Outdated
@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 11, 2026

@BhoomishGupta On MacOS the new tests are failing. Inspect reports an issue as well. Please pay attention to the CIs yourself.

@BhoomishGupta
Copy link
Copy Markdown
Contributor Author

@BhoomishGupta On MacOS the new tests are failing. Inspect reports an issue as well. Please pay attention to the CIs yourself.

Ah, my bad, I had my wires crossed and was looking at the CI for a different PR. I will check the MacOS and Inspect logs for this one tomorrow and will push a fix soon. I'll pay closer attention to the CI status here going forward.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 14, 2026

@BhoomishGupta ping?

Copilot AI review requested due to automatic review settings April 15, 2026 17:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes reduce partition initialization for cases where value_type is not convertible to T (issue #6647) by ensuring partitions have at least two elements and by adding a regression test. Also introduces an adjust_chunk_size_and_max_chunks executor-parameters customization point with dispatch/rebind support and corresponding unit tests.

Changes:

  • Fix parallel reduce partition reduction to avoid assuming *first converts to T, and add a regression test (reduce_6647).
  • Add adjust_chunk_size_and_max_chunks customization point + rebind/wrapping support, and update chunking code to use it.
  • Extend unit tests around executor-parameters dispatch/rebinding (including dispatch order guarantees).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
libs/core/execution/include/hpx/execution/executors/rebind_executor_parameters.hpp Adds support for rebinding/wrapping adjust_chunk_size_and_max_chunks and fixes mark_begin_execution double-dispatch.
libs/core/execution/include/hpx/execution/executors/execution_parameters_fwd.hpp Declares the adjust_chunk_size_and_max_chunks CPO.
libs/core/execution/include/hpx/execution/executors/execution_parameters.hpp Implements default adjustment logic + property plumbing for adjust_chunk_size_and_max_chunks.
libs/core/execution/tests/unit/rebind_executor_parameters.cpp Adds rebind tests for adjust_chunk_size_and_max_chunks and dispatch order test for mark_begin_execution.
libs/core/execution/tests/unit/executor_parameters_dispatching.cpp Adds dispatch precedence tests for adjust_chunk_size_and_max_chunks between parameters and executor.
libs/core/algorithms/include/hpx/parallel/util/detail/chunk_size.hpp Switches chunk-size/max-chunks consistency logic to the new CPO.
libs/core/algorithms/include/hpx/parallel/algorithms/reduce.hpp Fixes parallel reduce partition reduction by enforcing partition sizes >= 2 and adding a helper that doesn’t require *first convertible to T.
libs/core/algorithms/tests/regressions/reduce_6647.cpp Adds regression test covering the reported reduce bug case.
libs/core/algorithms/tests/regressions/CMakeLists.txt Registers reduce_6647 regression test.
.circleci/tests.unit1.container_algorithms Adds a test entry to CI list.
components/parcel_plugins/coalescing/src/coalescing_message_handler.cpp Formatting-only change (line wrap).
components/containers/partitioned_vector/tests/unit/is_iterator_partitioned_vector.cpp Formatting-only change (include order).

Comment on lines 304 to +310
hpx::chrono::null_duration, cores, count);

// make sure, chunk size and max_chunks are consistent
adjust_chunk_size_and_max_chunks(
cores, count, max_chunks, chunk_size, true);
std::tie(chunk_size, max_chunks) =
hpx::execution::experimental::adjust_chunk_size_and_max_chunks(
policy.parameters(), policy.executor(), count, cores,
max_chunks, chunk_size);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the old adjust_chunk_size_and_max_chunks(..., has_variable_chunk_size=true) call sites with the new CPO, but the new call no longer preserves the has_variable_chunk_size behavior (old behavior was to assert chunk_size != 0 and skip further adjustments). This is a semantic change that can break algorithms relying on variable chunk sizes. Consider extending the CPO/property to accept a has_variable_chunk_size flag (or a dedicated customization point/type tag), or keep using the legacy helper in the variable-chunk-size paths. (Same issue applies to the other former ..., true) call sites in this file.)

Copilot uses AI. Check for mistakes.

// Handle single-element case: can't partition into size >= 2
// This must be checked for all execution policies
auto const count = hpx::parallel::detail::distance(first, last);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range length is computed twice (distance(first, last) at line 502 and again at line 533). For non-random-access iterators this is O(n) each time. Reuse the already computed count in the partitioner call to avoid redundant traversal.

Copilot uses AI. Check for mistakes.
Comment on lines +531 to +532
HPX_MOVE(reduce_policy), first,
hpx::parallel::detail::distance(first, last), HPX_MOVE(f1),
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range length is computed twice (distance(first, last) at line 502 and again at line 533). For non-random-access iterators this is O(n) each time. Reuse the already computed count in the partitioner call to avoid redundant traversal.

Suggested change
HPX_MOVE(reduce_policy), first,
hpx::parallel::detail::distance(first, last), HPX_MOVE(f1),
HPX_MOVE(reduce_policy), first, count, HPX_MOVE(f1),

Copilot uses AI. Check for mistakes.
Comment on lines +394 to 402
struct reduce_executor_parameters
{
template <typename Executor>
HPX_FORCEINLINE constexpr std::pair<std::size_t, std::size_t>
adjust_chunk_size_and_max_chunks(Executor&&,
std::size_t num_elements, std::size_t num_cores,
std::size_t /*max_chunks*/,
std::size_t chunk_size) const noexcept
{
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rebound (wrapping) parameters object, reduce_executor_parameters currently ignores the incoming max_chunks entirely (commented out) and returns a freshly computed new_max_chunks. This can effectively bypass user-provided parameters like max_num_chunks() on the wrapped policy. A more composable approach is to implement a wrapping customization (i.e., an overload that takes InnerParams&& inner) which first calls adjust_chunk_size_and_max_chunks(inner, exec, ...) and then adjusts the returned {chunk_size, max_chunks} minimally to enforce the partition-size constraint (>=2 and no remainder partition of size 1). This keeps existing parameter constraints intact while still fixing #6647.

Copilot uses AI. Check for mistakes.
Comment on lines +618 to +655
if (max_chunks == 0)
{
if (chunk_size == 0)
{
std::size_t const cores_times_4 = 4 * num_cores; // -V112

chunk_size =
(num_elements + cores_times_4 - 1) / cores_times_4;

max_chunks =
(std::min) (cores_times_4, num_elements); // -V112

chunk_size = (std::max) (chunk_size,
(num_elements + max_chunks - 1) / max_chunks);
}
else
{
// max_chunks == 0 && chunk_size != 0
max_chunks = (num_elements + chunk_size - 1) / chunk_size;
}
}
else if (chunk_size == 0)
{
chunk_size = (num_elements + max_chunks - 1) / max_chunks;
}
else
{
// max_chunks != 0 && chunk_size != 0
std::size_t const calculated_max_chunks =
(num_elements + chunk_size - 1) / chunk_size;

if (calculated_max_chunks > max_chunks)
{
chunk_size = (num_elements + max_chunks - 1) / max_chunks;
}
}

return {chunk_size, max_chunks};
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default chunk adjustment logic is effectively duplicated here (it closely mirrors the existing hpx::parallel::util::detail::adjust_chunk_size_and_max_chunks(...) logic shown in context). Maintaining two independent implementations risks divergence over time. Consider refactoring the default property implementation to delegate to the existing helper (or moving the shared logic into a single common internal function) so there’s one source of truth.

Suggested change
if (max_chunks == 0)
{
if (chunk_size == 0)
{
std::size_t const cores_times_4 = 4 * num_cores; // -V112
chunk_size =
(num_elements + cores_times_4 - 1) / cores_times_4;
max_chunks =
(std::min) (cores_times_4, num_elements); // -V112
chunk_size = (std::max) (chunk_size,
(num_elements + max_chunks - 1) / max_chunks);
}
else
{
// max_chunks == 0 && chunk_size != 0
max_chunks = (num_elements + chunk_size - 1) / chunk_size;
}
}
else if (chunk_size == 0)
{
chunk_size = (num_elements + max_chunks - 1) / max_chunks;
}
else
{
// max_chunks != 0 && chunk_size != 0
std::size_t const calculated_max_chunks =
(num_elements + chunk_size - 1) / chunk_size;
if (calculated_max_chunks > max_chunks)
{
chunk_size = (num_elements + max_chunks - 1) / max_chunks;
}
}
return {chunk_size, max_chunks};
return hpx::parallel::util::detail::adjust_chunk_size_and_max_chunks(
num_elements, num_cores, max_chunks, chunk_size);

Copilot uses AI. Check for mistakes.
Comment on lines +522 to +524
auto rebound_params =
hpx::execution::experimental::rebind_executor_parameters(
policy.parameters(), reduce_executor_parameters{});
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says changes were limited to a few algorithm files/tests, but this PR also introduces a new executor-parameters customization point (adjust_chunk_size_and_max_chunks) plus rebind/dispatch wiring and adds/updates several execution unit tests and CI test lists. Please update the PR description (or narrow the PR scope) so reviewers can accurately assess the broader API and behavior changes included here.

Copilot uses AI. Check for mistakes.
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
Signed-off-by: Bhoomish <bhoomishguptabits@gmail.com>
…max_chunks functionality

Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
Signed-off-by: BhoomishGupta <bhoomishguptabits@gmail.com>
@BhoomishGupta
Copy link
Copy Markdown
Contributor Author

@BhoomishGupta ping?

Hi @hkaiser, sorry for the delay, I had academic tests. I have pushed updates and resolved the conflicts; please take another look.

I will also take a look into the snippets copilot reviewed.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 15, 2026

@BhoomishGupta please also have a look at the failing test: rebind_executor_parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect reduce implementation

4 participants