Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions quest/include/environment.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,14 @@ int isQuESTEnvInit();
QuESTEnv getQuESTEnv();


/** @notyetdoced
* GPU thread per block control
* This is somehow probably the best pre-existing place for this. It only really applies to GPU, because for
* OpenMP the user can just export OMP_NUM_THREADS or call omp_set_num_threads.
*/
int getQuESTGpuThreadsPerBlock();
void setQuESTGpuThreadsPerBlock(const int NEW_TPB);


// end de-mangler
#ifdef __cplusplus
Expand Down
11 changes: 11 additions & 0 deletions quest/src/api/environment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -509,5 +509,16 @@ void getEnvironmentString(char str[200]) {
}


int getQuESTGpuThreadsPerBlock() {
QuESTEnv env = getQuESTEnv();
return env.isGpuAccelerated? gpu_getNumThreadsPerBlock() : 0;
}

void setQuESTGpuThreadsPerBlock(const int NEW_TPB) {
// just rely on the internal function to throw an error if there's no GPU support compiled
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: validate this is a factor of 32 (and is positive, etc etc)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc to user: HIP warpsize is 64!

gpu_setNumThreadsPerBlock(NEW_TPB);
return;
}

// end de-mangler
}
4 changes: 1 addition & 3 deletions quest/src/cpu/cpu_config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,7 @@ int cpu_getAvailableNumThreads() {
#if COMPILE_OPENMP
int n = -1;

#pragma omp parallel shared(n)
#pragma omp single
n = omp_get_num_threads();
n = omp_get_max_threads();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this functionally wrong? We wish to return the number of available threads as set by the user, and which is the default adopted by our openmp pragmas. If you call omp_get_max_threads() outside a parallel region, won't it just return 1?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an EPCC colleague:

omp_set_num_threads() set the value of the nthreads internal control variable, but omp_get_num_threads() does not get this value ( but omp_get_max_threads() does).

Standards aren't immune to issues.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as James indicates omp_get_num_threads outside a parallel region will return 1, but omp_get_max_threads returns OMP_NUM_THREADS or whatever was last set using omp_set_num_threads. Parallel regions without a num_threads clause then use that value.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, my brain hadn't even noticed the change of num to max` 😅


return n;
#else
Expand Down
19 changes: 19 additions & 0 deletions quest/src/gpu/gpu_config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
#include "quest/src/gpu/cuda_to_hip.hpp"
#endif

int numThreadsPerBlock = 128;


/*
Expand Down Expand Up @@ -330,6 +331,24 @@ qindex gpu_getMaxNumConcurrentThreads() {
* ENVIRONMENT MANAGEMENT
*/

int gpu_getNumThreadsPerBlock() {
#if COMPILE_CUDA
return numThreadsPerBlock;
#else
error_gpuQueriedButGpuNotCompiled();
return -1;
#endif
}

void gpu_setNumThreadsPerBlock(const int newThreadsPerBlock) {
#if COMPILE_CUDA
numThreadsPerBlock = newThreadsPerBlock;
#else
error_gpuQueriedButGpuNotCompiled();
#endif
return;
}


std::array<char,17> getBoundGpuUuid() {
#if COMPILE_CUDA
Expand Down
8 changes: 5 additions & 3 deletions quest/src/gpu/gpu_config.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
#include "quest/include/channels.h"



/*
* CUDA ERROR HANDLING
*/
Expand Down Expand Up @@ -65,6 +64,10 @@ qindex gpu_getMaxNumConcurrentThreads();
* ENVIRONMENT MANAGEMENT
*/

int gpu_getNumThreadsPerBlock();

void gpu_setNumThreadsPerBlock(const int newThreadsPerBlock);

void gpu_bindLocalGPUsToNodes();

bool gpu_areAnyNodesBoundToSameGpu();
Expand All @@ -76,7 +79,6 @@ void gpu_initCuQuantum();
void gpu_finalizeCuQuantum();



/*
* MEMORY MANAGEMENT
*/
Expand Down Expand Up @@ -122,4 +124,4 @@ size_t gpu_getCacheMemoryInBytes();



#endif // GPU_CONFIG_HPP
#endif // GPU_CONFIG_HPP
8 changes: 2 additions & 6 deletions quest/src/gpu/gpu_kernels.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -46,23 +46,19 @@
* THREAD MANAGEMENT
*/


const int NUM_THREADS_PER_BLOCK = 128;


__forceinline__ __device__ qindex getThreadInd() {
return blockIdx.x*blockDim.x + threadIdx.x;
}


__host__ qindex getNumBlocks(qindex numThreads) {
__host__ qindex getNumBlocks(qindex numThreads, const int numThreadsPerBlock) {

/// @todo
/// improve this with cudaOccupancyMaxPotentialBlockSize(),
/// making it function specific

// CUDA ceil
return ceil(numThreads / static_cast<qreal>(NUM_THREADS_PER_BLOCK));
return ceil(numThreads / static_cast<qreal>(numThreadsPerBlock));
}


Expand Down
Loading
Loading