Avoid implicit conversions for bitwise operators#2708

Open

mnijhuis-tos wants to merge 2 commits intoxtensor-stack:masterfrom

mnijhuis-tos:avoid_bitwise_conversions

Contributor

mnijhuis-tos commented Jun 2, 2023 •

edited

Loading

This PR partially addresses #2707.

When using XSimd, the following code will compile, however, the resulting code is inefficient if the return type of operator|(const char&, const char&) is int instead of char. [C++ allows implicitly converting/promoting char to int.]
(https://en.cppreference.com/w/cpp/language/implicit_conversion).

#include <xtensor/xfixed.hpp>

void xtensor_or(xt::xtensor_fixed<char, xt::xshape<16>>& b1,
                const xt::xtensor_fixed<char, xt::xshape<16>>& b2) {
    b1 |= b2;    
}

The generated assembly code performs the operation using 32-bit integers and contains many pack and unpack instructions around it. Supplying explict return types in the functors instead of 'auto' avoids those pack and unpack instructions and has a single 'vpor' instruction (when using avx2) that or's all 16 values at once.

Although this improvement works for small integral values like char and unsigned short, using bool still results in inefficient code since xt_simd::simd_condition<bool, bool>::value is false. xsimd::detail::simd_condition explicitly is false when both types are bool, for unknown reasons.

I have only applied the improvement to bitwise operators, since it's probably not safe for other operators. For example, adding characters may overflow so there the implicit conversion to int is useful. On the other side, adding ints themselves could also overflow and they are not promoted to other types. Perhaps this optimization is possible for all operators.


          Avoid implicit conversions for bitwise operators

bedefc1

mnijhuis-tos force-pushed the avoid_bitwise_conversions branch from d3732a7 to bedefc1 Compare

June 2, 2023 09:37


          Apply clang-format diff from CI

34510b0

tdegeus reviewed

View reviewed changes

include/xtensor/xoperation.hpp

+              #define UNARY_BITWISE_OPERATOR_FUNCTOR(NAME, OP)                   \
+                  struct NAME                                                    \
+                  {                                                              \
+                      template <class A1>                                        \

Member

tdegeus Jun 14, 2023

Question: is the name A1 something that was used in similar bits of codes. If not, I would find e.g. T more logical (or even A or E which is used in many places)

Contributor Author

mnijhuis-tos Jun 14, 2023

The existing UNARY_OPERATOR_FUNCTOR indeed also uses A1. I'll happily change both to T, A, or E.

tdegeus reviewed

View reviewed changes

include/xtensor/xoperation.hpp

+                          return OP arg;                                         \
+                      }                                                          \
+                      template <class B>                                         \
+                      constexpr auto simd_apply(const B& arg) const              \

Member

tdegeus Jun 14, 2023

Question out of ignorance: Why don't you use std::decay_t<B> here.
Style: please use the same template name.

Contributor Author

mnijhuis-tos Jun 14, 2023

Using std::decay_t<B> indeed is consistent with the return type of operator().
Style: Do you mean I should use the same template argument name in operator() and simd_apply, e.g., use template <class T> in both cases?

Member

tdegeus Mar 19, 2024

That would be great, yes!

tdegeus reviewed

View reviewed changes

include/xtensor/xoperation.hpp

Comment on lines +107 to +109

+                      constexpr std::                                                                                              \
+                          conditional_t<(sizeof(std::decay_t<T1>) > sizeof(std::decay_t<T2>)), std::decay_t<T1>, std::decay_t<T2>> \
+                          operator()(T1&& arg1, T2&& arg2) const                                                                   \

Member

tdegeus Jun 14, 2023

I have no idea what happens here: the opening and closing (..) and <..> don't even seem to match?

Contributor Author

mnijhuis-tos Jun 14, 2023 •

edited by tdegeus

Loading

The tricky part of this expression is that one of the > is a greater-than sign:

The first argument to std::condition_t is sizeof(std::decay_t<T1>) > sizeof(std::decay_t<T2>). Because of the greater-than sign, I'm using extra parentheses around it. I can probably replace it by std::greater, however, I'm afraid the expression will mainly get longer and doesn't become more readable.

The other arguments to std::condition_t are std::decay_t<T1> and std::decay_t<T2>. So, the conditional selects the largest type of T1 and T2. A (bitwise) operator on two chars will now yield a char, instead of an int. Combining a short and a long yields a long.

Combining two equally-sized unsigned and signed types currently yields the second type, e.g, combining uint16_t and int16_t yields an int16_t return value. For bitwise operations, this shouldn't be a problem. Note that I can easily make it return the first type using >= instead of > when comparing the sizes.

For other operations, like addition or subtraction, I can imagine we have to follow the C++ rules closer. I found the following results when using decltype with g++ 11.3:

decltype(int8_t() + int8_t()) -> int
decltype(uint8_t() + uint8_t()) -> int`
decltype(int8_t() + uint8_t()) -> int
decltype(uint8_t() + int8_t()) -> int
(u)int16_t: Same result as (u)int8_t: Always int.
decltype(int32_t() + int32_t()) -> int32_t
decltype(uint32_t() + uint32_t()) -> uint32_t`
decltype(int32_t() + uint32_t()) -> uint32_t
decltype(uint32_t() + int32_t()) -> uint32_t
decltype(int32_t() + int64_t()) -> int64_t
decltype(uint32_t() + uint64_t()) -> uint64_t`
decltype(int32_t() + uint64_t()) -> uint64_t
decltype(uint32_t() + int64_t()) -> int64_t
decltype(int32_t() + float()) -> float
decltype(uint32_t() + float()) -> float
decltype(float() + uint32_t()) -> float
decltype(float() + int32_t()) -> float
decltype(int64_t() + float()) -> float (even though sizeof(float) (4 bytes) is smaller than sizeof(int64_t)!
decltype(uint64_t() + float()) -> float
decltype(float() + uint64_t()) -> float
decltype(float() + int64_t()) -> float

Perhaps the following stategy will work for all binary operations:

If the input types are equal, use that as the return type.
If both types are floating point types, return the largest type.
If only one of the types is a floating point type, return the floating point type.
If the type sizes differ, return the largest type (which can be signed).
If the type sizes are equal and they only differ in signedness, return the unsigned type.

Code similar to simd_return_type_impl in XSimd could implement this return type strategy.

Member

tdegeus Mar 19, 2024

Thank you so much for the clarification!! I'm just not sure about the latter case: stripping the signedness is dangerous here no?

tdegeus reviewed

View reviewed changes

include/xtensor/xoperation.hpp

+              #define BINARY_BITWISE_OPERATOR_FUNCTOR(NAME, OP)                                                                    \
+                  struct NAME                                                                                                      \
+                  {                                                                                                                \
+                      template <class T1, class T2>                                                                                \

Member

tdegeus Jun 14, 2023

I would use T and S

Contributor Author

mnijhuis-tos Jun 14, 2023

The existing BINARY_OPERATOR_FUNCTOR uses T1 and T2, too. Using T1 and T2 as names reflects the fact that the types are typically similar or the same, so I rather stick to T1 and T2.

tdegeus reviewed

View reviewed changes

include/xtensor/xoperation.hpp

+                          using xt::detail::operator OP;                                                                           \
+                          return (std::forward<T1>(arg1) OP std::forward<T2>(arg2));                                               \
+                      }                                                                                                            \
+                      template <class B>                                                                                           \

Member

tdegeus Jun 14, 2023

Why don't you allow the two types here?

Contributor Author

mnijhuis-tos Jun 14, 2023

The existing BINARY_OPERATOR_FUNCTOR also has a single template argument for simd_apply. I don't know why. Using two arguments seems more logical, indeed.

Member

tdegeus commented Jun 14, 2023

Thanks. This is not my expertise, but I do already have some questions.

tdegeus added the Needs review label

Contributor

spectre-ns commented Nov 12, 2023

@tdegeus the checks on this appear to be stuck burning up compute time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels