Conversation
| for (size_t i = 0; i < 10; ++i) { | ||
| std::cout << host_out[i] << std::endl; | ||
| } | ||
| self.op_state_.propagate_completion_signal(stdexec::set_value, d_out); |
There was a problem hiding this comment.
If you change d_out to *d_out you can confirm that the data is scanning properly. But it won't compile I imagine because of the comcpletion signatures being wrong.
|
|
||
| template <class SenderId, class ReceiverId, class InitT, class Fun> | ||
| struct receiver_t | ||
| : public __algo_range_init_fun::receiver_t< |
There was a problem hiding this comment.
This reuses the algorithm_base.cuh . The ExclusiveScan API I used was the one that allows you to specify an initial value, so I could easily reuse this base. Nearly all of scan.cuh is identical to reduce with the exception of the CUB api they call and the final return type.
The difference between the reduce is that it returns a single value where as a scan is to return an array of data so it is very similar.
| // template <class Range> | ||
| // using _set_value_t = completion_signatures<set_value_t( | ||
| // std::vector<typename __algo_range_init_fun::binary_invoke_result_t<Range, InitT, Fun>>)>; | ||
|
|
There was a problem hiding this comment.
Not sure how to get the completion signatures right. Hoping to get some guidance
|
@gevtushenko Can you take a look? |
Just an initial skeleton of a scan implementation for CUDA. For brevity I just used the reduce test spec to test my changes. Obviously it would need it's own spec.