QuEST juggles many algorithmic and implementation optimisations which have different performance benefits in different settings (e.g. multithreaded vs GPU-accelerated), and scales (e.g. number of amplitudes in a statevector). This means it's often easy to miss one! There are likely many unexpected "hotspots" or unexpected sources of slowdown in the QuEST library, or missed opportunities for straightforward algorithmic improvements.
The best way to find optimisations of software is through a profiler; a tool to find and visualise which parts of the codebase are unexpectedly slow, or which are invoked unexpectedly frequently. Use such a tool to find a suspicious bottleneck in QuEST's performance, which may be a candidate for improvement. This effort will involve:
- Running QuEST at different scales and deployments.
- Using a profiler specific to the explored deployments. For example, GPU profiling can done with NVIDIA Nsight systems, or MPI profiling with Scalasca.
- Presenting a conclusion supported by profiling data, and a possible proof-of-concept implementation of a proposed optimisation.
For inspiration, some previous, isolated QuEST optimisations include:
- (#743) Replacing
std::vector<int> with a stack-based list to avoid superfluous heap allocations.
- (#739) Avoiding CUDA memory copies when passing qubit lists to kernels.
- (#736) Changing the parallelisation granularity of CUDA kernels.
- (#729) Avoiding
std::complex arithmetic operators.
- (#684) Experimenting with compiler flags.
- (#658) Enabling NUMA-awareness.
An ideal optimisation would see a speedup of over 15% for more than 10 qubits, but this is not a concrete hurdle. We are mostly interested in new insights about QuEST's performance characteristics and bottlenecks, as empirically evidenced.
QuEST juggles many algorithmic and implementation optimisations which have different performance benefits in different settings (e.g. multithreaded vs GPU-accelerated), and scales (e.g. number of amplitudes in a statevector). This means it's often easy to miss one! There are likely many unexpected "hotspots" or unexpected sources of slowdown in the QuEST library, or missed opportunities for straightforward algorithmic improvements.
The best way to find optimisations of software is through a profiler; a tool to find and visualise which parts of the codebase are unexpectedly slow, or which are invoked unexpectedly frequently. Use such a tool to find a suspicious bottleneck in QuEST's performance, which may be a candidate for improvement. This effort will involve:
For inspiration, some previous, isolated QuEST optimisations include:
std::vector<int>with a stack-based list to avoid superfluous heap allocations.std::complexarithmetic operators.An ideal optimisation would see a speedup of over 15% for more than 10 qubits, but this is not a concrete hurdle. We are mostly interested in new insights about QuEST's performance characteristics and bottlenecks, as empirically evidenced.