Skip to content

Adding 'empty' kernel launch preset#297

Open
gilbertlee-amd wants to merge 1 commit into
ROCm:candidatefrom
gilbertlee-amd:EmptyKernel
Open

Adding 'empty' kernel launch preset#297
gilbertlee-amd wants to merge 1 commit into
ROCm:candidatefrom
gilbertlee-amd:EmptyKernel

Conversation

@gilbertlee-amd
Copy link
Copy Markdown
Collaborator

Motivation

Some of the timing functions in TransferBench include time outside of just Transfer. The empty preset allows sweep across multiple ranks / gpus to try measuring the time it takes to launch a purely empty HIP kernel to try to understand this extra amount of time. Kernel launches can be batched together prior to synchronization to understand how launch latency may be amortized.

Example output

[EmptyKernel Related]
BATCHSIZES           =     1,16,256 : Kernels per batch before hipStreamSynchronize
GRIDSIZES            =          256 : Grid X dimension (# threadblocks per kernel launch, set to ',' to sweep all)
BLOCKSIZES           =          256 : Thread-block width (blockDim.x)
NUM_GPU_DEVICES      =            8 : GPUs per rank to benchmark
NUM_ITERATIONS       =            5 : Timed passes per cell (HIP and CPU measured separately each pass)
NUM_WARMUPS          =            3 : Untimed warmup iterations
OUTPUT_TO_CSV        =            0 : CSV formatting for result table
SHOW_ITERATIONS      =            0 : Show per-iteration EVT/CPU columns before MIN/AVG/MAX

EmptyKernel preset: times in microseconds per kernel launch (averaged across batch size).

Evt = hipEvent measured time
Cpu = CPU wallclock measured time

BatS     GrdS BlkS Rank GPU EvtMin EvtAvg EvtMax CpuMin CpuAvg CpuMax
   1      256  256    0   0  4.040  4.192  4.400 10.316 10.800 11.857
   1      256  256    0   1  4.000  4.024  4.040 10.161 10.491 10.830
   1      256  256    0   2  3.881  4.168  4.241 10.072 10.647 12.254
   1      256  256    0   3  4.161  4.224  4.280 10.442 10.740 10.903
   1      256  256    0   4  4.000  4.136  4.280 11.604 12.019 12.429
   1      256  256    0   5  3.960  4.104  4.161 11.461 11.693 11.892
   1      256  256    0   6  4.040  4.136  4.280 11.740 12.068 12.355
   1      256  256    0   7  4.001  4.080  4.240 11.914 12.309 13.126
  16      256  256    0   0  2.355  2.679  3.090  2.802  3.816  4.422
  16      256  256    0   1  2.045  2.965  3.930  3.832  4.039  4.340
  16      256  256    0   2  2.215  2.675  3.460  3.361  3.808  4.724
  16      256  256    0   3  2.155  2.705  3.358  3.093  3.841  4.266
  16      256  256    0   4  2.185  2.811  3.238  3.717  4.009  4.211
  16      256  256    0   5  2.635  2.813  3.093  3.710  4.504  5.486
  16      256  256    0   6  2.395  2.881  3.298  3.927  4.428  5.186
  16      256  256    0   7  2.975  3.273  3.643  4.198  4.682  5.082
 256      256  256    0   0  1.834  1.940  2.097  1.873  2.014  2.135
 256      256  256    0   1  1.818  1.977  2.108  2.058  2.115  2.186
 256      256  256    0   2  1.871  1.969  2.090  1.932  2.109  2.188
 256      256  256    0   3  1.831  1.965  2.201  2.082  2.146  2.182
 256      256  256    0   4  1.899  2.014  2.150  2.006  2.108  2.204
 256      256  256    0   5  1.892  2.079  2.266  2.074  2.203  2.402
 256      256  256    0   6  2.129  2.197  2.269  2.192  2.235  2.260
 256      256  256    0   7  1.957  2.068  2.185  2.117  2.196  2.280

@gilbertlee-amd gilbertlee-amd requested review from a team as code owners May 11, 2026 06:50
@nileshnegi nileshnegi requested a review from Copilot May 11, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants