Skip to content

ci: align relocatable package flow with RVS reference#287

Merged
thananon merged 2 commits intodevelopfrom
ci/relocatable-package-rvs-parity
May 5, 2026
Merged

ci: align relocatable package flow with RVS reference#287
thananon merged 2 commits intodevelopfrom
ci/relocatable-package-rvs-parity

Conversation

@thananon
Copy link
Copy Markdown
Contributor

@thananon thananon commented May 4, 2026

Summary

Brings TransferBench's TheRock-SDK-based package build into closer parity with the ROCmValidationSuite reference flow. Closes the four gaps identified after #262 landed.

Gap 1 — RPATH defaults move into CMakeLists.txt

RPATH list previously lived only in build_packages_local.sh (-DCMAKE_INSTALL_RPATH=…), so plain cmake -DBUILD_RELOCATABLE_PACKAGE=ON .. produced a non-relocatable binary. Defaults now live inside the if(BUILD_RELOCATABLE_PACKAGE) block in CMakeLists.txt. The script drops the three -DCMAKE_*_RPATH flags. The list is also extended to match RVS:

$ORIGIN:$ORIGIN/../lib:/opt/rocm/extras-<MAJOR>/lib:/opt/rocm/lib:/opt/rocm/lib/llvm/lib:/opt/rocm/core-<MAJOR>/lib:/opt/rocm/core-<MAJOR>/lib/llvm/lib

Gap 2 — Auto-computed patch version

VERSION_STRING patch is now derived from git describe --tags --abbrev=0 --match v<MAJOR>.<MINOR>.* followed by git rev-list --count <tag>..HEAD. Falls back to a hardcoded patch (02) when git is unavailable or no matching tag exists. No user-visible change today since no v1.66.* tag exists yet (latest v-tag is v1.62.00); once a v1.66.00 tag is created, builds will be 1.66.<commits-since>.

Gap 3 — RVS-style package release tag

build_packages_local.sh now emits release strings matching the RVS format:

Trigger Release string
push / schedule / dispatch / local r<libpatch>.<yyyymmdd> (e.g. r0711.20260504)
pull request r<libpatch>.<yyyymmdd>.<src-branch>.<commit>
release/* branch ${GITHUB_RUN_NUMBER} (fallback 1)

UTC date captured once at script start so long builds can't straddle midnight. The over-permissive bare rel* branch matcher is dropped (workflow only triggers release/**).

Gap 4 — User-facing TGZ install doc

TGZ install instructions move out of the CI-engineer-facing README_BUILD_PACKAGES.md into a new Sphinx doc at docs/install/INSTALL_TGZ.rst, covering pre-install ROCm, runtime deps (numactl, libnuma1, hsa-rocr — no PCI deps since TransferBench doesn't enumerate via PCI), extraction, PATH/LD_LIBRARY_PATH, and persistent /etc/profile.d setup. install.rst cross-links to it. README_BUILD_PACKAGES.md keeps a minimal CI-maintainer smoke test plus a link to the new doc.

Test plan

  • cmake -B build_verify -DBUILD_RELOCATABLE_PACKAGE=ON -DROCM_MAJOR_VERSION=7 … prints -- TransferBench version: 1.66.02 (correct fallback)
  • Bash release-tag logic smoke-tested in default / release / PR modes — produces r0711.20260504, 42, r0711.20260504.feature.foo.<commit> respectively
  • CI: push triggers build-relocatable-packages.yml; verify package filenames carry the new r<libpatch>.<yyyymmdd> release string
  • CI: confirm readelf -d on the built binary shows the extended RPATH list and no $HOME/rocm-sdk/install leak
  • Render Sphinx docs (python3 -m sphinx -W -b html …) and confirm INSTALL_TGZ.html builds without warnings

Brings TransferBench's TheRock-SDK-based package build into closer parity
with the ROCmValidationSuite reference flow:

- CMakeLists.txt: auto-compute patch version from `git describe --tags
  --match v<MAJOR>.<MINOR>.*` (commits since the last matching tag),
  falling back to the hardcoded patch when no tag is reachable. No
  behavior change today since no v1.66.* tag exists yet; once one is
  created, builds become 1.66.<commits-since>.
- CMakeLists.txt: move relocatable RPATH defaults out of
  build_packages_local.sh and into the BUILD_RELOCATABLE_PACKAGE block
  so plain `cmake -DBUILD_RELOCATABLE_PACKAGE=ON ..` produces the same
  RPATH as a packaged build. RPATH list now includes /opt/rocm/lib/llvm,
  /opt/rocm/core-<MAJOR>/lib, and the matching llvm path.
- build_packages_local.sh: package release tag now follows the RVS
  format (`r<libpatch>.<yyyymmdd>` for default builds,
  `r<libpatch>.<yyyymmdd>.<src-branch>.<commit>` for PRs,
  GITHUB_RUN_NUMBER for release/* branches). UTC date captured once so
  long builds can't straddle midnight. Drops the over-permissive `rel*`
  prefix matcher.
- build_packages_local.sh: drop the now-redundant -DCMAKE_INSTALL_RPATH
  / -DCMAKE_SKIP_RPATH / -DCMAKE_INSTALL_RPATH_USE_LINK_PATH flags
  (CMakeLists.txt is now the single source of truth).
- docs/install/INSTALL_TGZ.rst: new user-facing TGZ install guide
  covering ROCm pre-install, runtime deps, extraction, PATH /
  LD_LIBRARY_PATH setup, and persistent profile.d configuration.
- docs/install/install.rst: cross-link to the new TGZ doc.
- README_BUILD_PACKAGES.md: replace the inline TGZ install snippet with
  a link to the new doc plus a minimal smoke test for CI maintainers.
@thananon thananon requested review from a team as code owners May 4, 2026 22:06
Comment thread .github/workflows/README_BUILD_PACKAGES.md Outdated
Comment thread docs/install/INSTALL_TGZ.rst
@alex-breslow-amd
Copy link
Copy Markdown

I would try to automate the first-level troubleshooting you listed if possible. Then the doc can just refer the user to the output that's already part of the logs.

Address @gilbertlee-amd review on PR #287:

  --help doesn't exist. In candidate we already introduce the "help"
  preset [...]. For CI moving forward, we likely want to use
  ./TransferBench smoketest. However, that's also in candidate branch
  still.

On develop today, running TransferBench with no arguments prints
version, usage, the available preset list, and the detected topology
then exits 0 (Client.cpp:41 — `if (argc <= 1)` branch). That is a valid
load-time smoke test for the binary.

Replace `TransferBench --help` with `TransferBench` in:
- .github/workflows/README_BUILD_PACKAGES.md (CI maintainer smoke test)
- docs/install/INSTALL_TGZ.rst (end-user verify step)

Add a forward-looking note in the README that once the `help` and
`smoketest` presets graduate from candidate to develop, the smoke test
should switch to `TransferBench smoketest`.
@thananon
Copy link
Copy Markdown
Contributor Author

thananon commented May 4, 2026

@gilbertlee-amd good catch — --help was wishful thinking on my part. Pushed a fix that:

  • Replaces TransferBench --help with plain TransferBench in both the README smoke test and INSTALL_TGZ.rst. On develop today, argc <= 1 (Client.cpp:41) prints version + usage + preset list + detected topology then exits 0, which is the right load-time check.
  • Adds a forward-looking note in the README that once the help and smoketest presets graduate from candidate to develop, the smoke test should switch to TransferBench smoketest.

Left @alex-breslow-amd's two open suggestions (run a real workload like a2a; auto-run the troubleshooting ldd/readelf) for follow-up — both are good ideas but bigger than this PR's scope (the a2a one in particular wants the smoketest preset to land first).

@alex-breslow-amd
Copy link
Copy Markdown

@gilbertlee-amd good catch — --help was wishful thinking on my part. Pushed a fix that:

  • Replaces TransferBench --help with plain TransferBench in both the README smoke test and INSTALL_TGZ.rst. On develop today, argc <= 1 (Client.cpp:41) prints version + usage + preset list + detected topology then exits 0, which is the right load-time check.
  • Adds a forward-looking note in the README that once the help and smoketest presets graduate from candidate to develop, the smoke test should switch to TransferBench smoketest.

Left @alex-breslow-amd's two open suggestions (run a real workload like a2a; auto-run the troubleshooting ldd/readelf) for follow-up — both are good ideas but bigger than this PR's scope (the a2a one in particular wants the smoketest preset to land first).

That's fine by me.

@thananon
Copy link
Copy Markdown
Contributor Author

thananon commented May 5, 2026

Codeql broken unrelated to the PR. If we are ok lets merge this and create a test release branch to verify this workflow end to end.

@thananon thananon merged commit 1b56fdc into develop May 5, 2026
4 of 5 checks passed
@thananon thananon deleted the ci/relocatable-package-rvs-parity branch May 5, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants