Avoid implicit conversions for bitwise operators#2708
Avoid implicit conversions for bitwise operators#2708mnijhuis-tos wants to merge 2 commits intoxtensor-stack:masterfrom
Conversation
d3732a7 to
bedefc1
Compare
| #define UNARY_BITWISE_OPERATOR_FUNCTOR(NAME, OP) \ | ||
| struct NAME \ | ||
| { \ | ||
| template <class A1> \ |
There was a problem hiding this comment.
Question: is the name A1 something that was used in similar bits of codes. If not, I would find e.g. T more logical (or even A or E which is used in many places)
There was a problem hiding this comment.
The existing UNARY_OPERATOR_FUNCTOR indeed also uses A1. I'll happily change both to T, A, or E.
| return OP arg; \ | ||
| } \ | ||
| template <class B> \ | ||
| constexpr auto simd_apply(const B& arg) const \ |
There was a problem hiding this comment.
Question out of ignorance: Why don't you use std::decay_t<B> here.
Style: please use the same template name.
There was a problem hiding this comment.
Using std::decay_t<B> indeed is consistent with the return type of operator().
Style: Do you mean I should use the same template argument name in operator() and simd_apply, e.g., use template <class T> in both cases?
| constexpr std:: \ | ||
| conditional_t<(sizeof(std::decay_t<T1>) > sizeof(std::decay_t<T2>)), std::decay_t<T1>, std::decay_t<T2>> \ | ||
| operator()(T1&& arg1, T2&& arg2) const \ |
There was a problem hiding this comment.
I have no idea what happens here: the opening and closing (..) and <..> don't even seem to match?
There was a problem hiding this comment.
The tricky part of this expression is that one of the > is a greater-than sign:
The first argument to std::condition_t is sizeof(std::decay_t<T1>) > sizeof(std::decay_t<T2>). Because of the greater-than sign, I'm using extra parentheses around it. I can probably replace it by std::greater, however, I'm afraid the expression will mainly get longer and doesn't become more readable.
The other arguments to std::condition_t are std::decay_t<T1> and std::decay_t<T2>. So, the conditional selects the largest type of T1 and T2. A (bitwise) operator on two chars will now yield a char, instead of an int. Combining a short and a long yields a long.
Combining two equally-sized unsigned and signed types currently yields the second type, e.g, combining uint16_t and int16_t yields an int16_t return value. For bitwise operations, this shouldn't be a problem. Note that I can easily make it return the first type using >= instead of > when comparing the sizes.
For other operations, like addition or subtraction, I can imagine we have to follow the C++ rules closer. I found the following results when using decltype with g++ 11.3:
-
decltype(int8_t() + int8_t()) ->
int -
decltype(uint8_t() + uint8_t()) -> int`
-
decltype(int8_t() + uint8_t()) ->
int -
decltype(uint8_t() + int8_t()) ->
int -
(u)int16_t: Same result as (u)int8_t: Always
int. -
decltype(int32_t() + int32_t()) ->
int32_t -
decltype(uint32_t() + uint32_t()) -> uint32_t`
-
decltype(int32_t() + uint32_t()) ->
uint32_t -
decltype(uint32_t() + int32_t()) ->
uint32_t -
decltype(int32_t() + int64_t()) ->
int64_t -
decltype(uint32_t() + uint64_t()) -> uint64_t`
-
decltype(int32_t() + uint64_t()) ->
uint64_t -
decltype(uint32_t() + int64_t()) ->
int64_t -
decltype(int32_t() + float()) ->
float -
decltype(uint32_t() + float()) ->
float -
decltype(float() + uint32_t()) ->
float -
decltype(float() + int32_t()) ->
float -
decltype(int64_t() + float()) ->
float(even thoughsizeof(float)(4 bytes) is smaller thansizeof(int64_t)! -
decltype(uint64_t() + float()) ->
float -
decltype(float() + uint64_t()) ->
float -
decltype(float() + int64_t()) ->
float
Perhaps the following stategy will work for all binary operations:
- If the input types are equal, use that as the return type.
- If both types are floating point types, return the largest type.
- If only one of the types is a floating point type, return the floating point type.
- If the type sizes differ, return the largest type (which can be signed).
- If the type sizes are equal and they only differ in signedness, return the unsigned type.
Code similar to simd_return_type_impl in XSimd could implement this return type strategy.
There was a problem hiding this comment.
Thank you so much for the clarification!! I'm just not sure about the latter case: stripping the signedness is dangerous here no?
| #define BINARY_BITWISE_OPERATOR_FUNCTOR(NAME, OP) \ | ||
| struct NAME \ | ||
| { \ | ||
| template <class T1, class T2> \ |
There was a problem hiding this comment.
The existing BINARY_OPERATOR_FUNCTOR uses T1 and T2, too. Using T1 and T2 as names reflects the fact that the types are typically similar or the same, so I rather stick to T1 and T2.
| using xt::detail::operator OP; \ | ||
| return (std::forward<T1>(arg1) OP std::forward<T2>(arg2)); \ | ||
| } \ | ||
| template <class B> \ |
There was a problem hiding this comment.
Why don't you allow the two types here?
There was a problem hiding this comment.
The existing BINARY_OPERATOR_FUNCTOR also has a single template argument for simd_apply. I don't know why. Using two arguments seems more logical, indeed.
|
Thanks. This is not my expertise, but I do already have some questions. |
|
@tdegeus the checks on this appear to be stuck burning up compute time. |
This PR partially addresses #2707.
When using XSimd, the following code will compile, however, the resulting code is inefficient if the return type of
operator|(const char&, const char&)isintinstead ofchar. [C++ allows implicitly converting/promotingchartoint.](https://en.cppreference.com/w/cpp/language/implicit_conversion).
The generated assembly code performs the operation using 32-bit integers and contains many pack and unpack instructions around it. Supplying explict return types in the functors instead of 'auto' avoids those pack and unpack instructions and has a single 'vpor' instruction (when using avx2) that or's all 16 values at once.
Although this improvement works for small integral values like
charandunsigned short, usingboolstill results in inefficient code sincext_simd::simd_condition<bool, bool>::valueis false.xsimd::detail::simd_conditionexplicitly is false when both types arebool, for unknown reasons.I have only applied the improvement to bitwise operators, since it's probably not safe for other operators. For example, adding characters may overflow so there the implicit conversion to
intis useful. On the other side, addingints themselves could also overflow and they are not promoted to other types. Perhaps this optimization is possible for all operators.