Restore mux scaling for SCF bandwidth and latency calculations by charles-typ · Pull Request #562 · facebookresearch/DCPerf

charles-typ · 2026-04-04T00:33:12Z

Summary:
D92222329 removed the mux (percentage-running) scaling from the SCF memory
bandwidth and latency calculations in generate_arm_perf_report.py, with
the reasoning that perf stat auto-scales counter values.

While perf stat does auto-scale counter_value, counter_runtime still
reflects only the actual time the PMU was active (not the full measurement
interval). When SCF events are multiplexed (e.g., 33% mux on Grace), this
causes:

scf_cycles / counter_runtime to be 3x the actual SCF frequency
Bandwidth to be reported 3x too high
Latency to be reported 3x too low (unrealistically fast)

Raw perf data from Grace benchmark runs confirms SCF events run at 33% mux:

5.004597489,163520049,,nvidia_scf_pmu_0/cmem_rd_access/,1670782496,33.00,,

This restores the mux correction originally added in D71513380: scale
counter_runtime by 100 / mux to recover the full interval duration
before computing derived metrics.

Affects: nvidia_scf_mem_read_bw_MBps, nvidia_scf_mem_write_bw_MBps,
nvidia_scf_mem_latency_ns.

Differential Revision: D99517233

Summary: D92222329 removed the mux (percentage-running) scaling from the SCF memory bandwidth and latency calculations in `generate_arm_perf_report.py`, with the reasoning that perf stat auto-scales counter values. While perf stat does auto-scale `counter_value`, `counter_runtime` still reflects only the actual time the PMU was active (not the full measurement interval). When SCF events are multiplexed (e.g., 33% mux on Grace), this causes: - `scf_cycles / counter_runtime` to be 3x the actual SCF frequency - Bandwidth to be reported 3x too high - Latency to be reported 3x too low (unrealistically fast) Raw perf data from Grace benchmark runs confirms SCF events run at 33% mux: ``` 5.004597489,163520049,,nvidia_scf_pmu_0/cmem_rd_access/,1670782496,33.00,, ``` This restores the mux correction originally added in D71513380: scale `counter_runtime` by `100 / mux` to recover the full interval duration before computing derived metrics. Affects: `nvidia_scf_mem_read_bw_MBps`, `nvidia_scf_mem_write_bw_MBps`, `nvidia_scf_mem_latency_ns`. Differential Revision: D99517233

meta-codesync · 2026-04-04T00:33:20Z

@charles-typ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99517233.

Summary: Pull Request resolved: #562 D92222329 removed the mux (percentage-running) scaling from the SCF memory bandwidth and latency calculations in `generate_arm_perf_report.py`, with the reasoning that perf stat auto-scales counter values. While perf stat does auto-scale `counter_value`, `counter_runtime` still reflects only the actual time the PMU was active (not the full measurement interval). When SCF events are multiplexed (e.g., 33% mux on Grace), this causes: - `scf_cycles / counter_runtime` to be 3x the actual SCF frequency - Bandwidth to be reported 3x too high - Latency to be reported 3x too low (unrealistically fast) Raw perf data from Grace benchmark runs confirms SCF events run at 33% mux: ``` 5.004597489,163520049,,nvidia_scf_pmu_0/cmem_rd_access/,1670782496,33.00,, ``` This restores the mux correction originally added in D71513380: scale `counter_runtime` by `100 / mux` to recover the full interval duration before computing derived metrics. Affects: `nvidia_scf_mem_read_bw_MBps`, `nvidia_scf_mem_write_bw_MBps`, `nvidia_scf_mem_latency_ns`. Reviewed By: b3nj1 Differential Revision: D99517233 fbshipit-source-id: 6fbd9469770ad240313b0b98ad65ab2d3ef2ed40

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore mux scaling for SCF bandwidth and latency calculations#562

Restore mux scaling for SCF bandwidth and latency calculations#562
charles-typ wants to merge 1 commit intofacebookresearch:v2-betafrom
charles-typ:export-D99517233-to-v2-beta

charles-typ commented Apr 4, 2026

Uh oh!

meta-codesync bot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

charles-typ commented Apr 4, 2026

Uh oh!

meta-codesync bot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant