Noticed that
|
const int NUM_THREADS_PER_BLOCK = 128; |
is fixed for all target hardware and is a bit large for common tuning recommendations.
Plan to change this to allow a compile time default and a setter-getter interface to allow performance tuning tests.
Noticed that
QuEST/quest/src/gpu/gpu_kernels.cuh
Line 50 in 9d7618d
is fixed for all target hardware and is a bit large for common tuning recommendations.
Plan to change this to allow a compile time default and a setter-getter interface to allow performance tuning tests.