Description
In some cases, it is necessary to insert a hardware barrier (snrt_cluster_hw_barrier()) after a vle32.v loads to get correct results. This does not happen consistently, making the issue intermittent.
I was able to reproduce the problem by implementing a vectorized backward_solve function for upper-triangular linear systems. For this reason, I suspect the issue may be related to the vector length (vl) not being a multiple of 2, although this still needs verification.
How to reproduce
- Extract the attached zip archive into
sw/spatzBenchmarks
- Add the following lines to
sw/spatzBenchmarks/CMakeLists.txt
add_library(backward-solve backward-solve/kernel/backward-solve.c)
add_spatz_test_oneParam(backward-solve backward-solve/main.c 64)
Expected Result
The test should fail computing the correct solution of the linera system.
Workaround
- Uncomment the hw barrier after the two loads in
kernel/backward-solve.c
asm volatile ("vle32.v v8, (%0)" :: "r"(p_dst));
asm volatile ("vle32.v v0, (%0)" :: "r"(p_mat));
snrt_cluster_hw_barrier();
- re-build and run test
- now test computes the correct solution of the linear system
backward-solve.zip
Description
In some cases, it is necessary to insert a hardware barrier (
snrt_cluster_hw_barrier()) after avle32.vloads to get correct results. This does not happen consistently, making the issue intermittent.I was able to reproduce the problem by implementing a vectorized
backward_solvefunction for upper-triangular linear systems. For this reason, I suspect the issue may be related to the vector length (vl) not being a multiple of 2, although this still needs verification.How to reproduce
sw/spatzBenchmarkssw/spatzBenchmarks/CMakeLists.txtExpected Result
The test should fail computing the correct solution of the linera system.
Workaround
kernel/backward-solve.cbackward-solve.zip