ci: align relocatable package flow with RVS reference#287
Conversation
Brings TransferBench's TheRock-SDK-based package build into closer parity with the ROCmValidationSuite reference flow: - CMakeLists.txt: auto-compute patch version from `git describe --tags --match v<MAJOR>.<MINOR>.*` (commits since the last matching tag), falling back to the hardcoded patch when no tag is reachable. No behavior change today since no v1.66.* tag exists yet; once one is created, builds become 1.66.<commits-since>. - CMakeLists.txt: move relocatable RPATH defaults out of build_packages_local.sh and into the BUILD_RELOCATABLE_PACKAGE block so plain `cmake -DBUILD_RELOCATABLE_PACKAGE=ON ..` produces the same RPATH as a packaged build. RPATH list now includes /opt/rocm/lib/llvm, /opt/rocm/core-<MAJOR>/lib, and the matching llvm path. - build_packages_local.sh: package release tag now follows the RVS format (`r<libpatch>.<yyyymmdd>` for default builds, `r<libpatch>.<yyyymmdd>.<src-branch>.<commit>` for PRs, GITHUB_RUN_NUMBER for release/* branches). UTC date captured once so long builds can't straddle midnight. Drops the over-permissive `rel*` prefix matcher. - build_packages_local.sh: drop the now-redundant -DCMAKE_INSTALL_RPATH / -DCMAKE_SKIP_RPATH / -DCMAKE_INSTALL_RPATH_USE_LINK_PATH flags (CMakeLists.txt is now the single source of truth). - docs/install/INSTALL_TGZ.rst: new user-facing TGZ install guide covering ROCm pre-install, runtime deps, extraction, PATH / LD_LIBRARY_PATH setup, and persistent profile.d configuration. - docs/install/install.rst: cross-link to the new TGZ doc. - README_BUILD_PACKAGES.md: replace the inline TGZ install snippet with a link to the new doc plus a minimal smoke test for CI maintainers.
|
I would try to automate the first-level troubleshooting you listed if possible. Then the doc can just refer the user to the output that's already part of the logs. |
Address @gilbertlee-amd review on PR #287: --help doesn't exist. In candidate we already introduce the "help" preset [...]. For CI moving forward, we likely want to use ./TransferBench smoketest. However, that's also in candidate branch still. On develop today, running TransferBench with no arguments prints version, usage, the available preset list, and the detected topology then exits 0 (Client.cpp:41 — `if (argc <= 1)` branch). That is a valid load-time smoke test for the binary. Replace `TransferBench --help` with `TransferBench` in: - .github/workflows/README_BUILD_PACKAGES.md (CI maintainer smoke test) - docs/install/INSTALL_TGZ.rst (end-user verify step) Add a forward-looking note in the README that once the `help` and `smoketest` presets graduate from candidate to develop, the smoke test should switch to `TransferBench smoketest`.
|
@gilbertlee-amd good catch —
Left @alex-breslow-amd's two open suggestions (run a real workload like a2a; auto-run the troubleshooting |
That's fine by me. |
|
Codeql broken unrelated to the PR. If we are ok lets merge this and create a test release branch to verify this workflow end to end. |
Summary
Brings TransferBench's TheRock-SDK-based package build into closer parity with the ROCmValidationSuite reference flow. Closes the four gaps identified after #262 landed.
Gap 1 — RPATH defaults move into
CMakeLists.txtRPATH list previously lived only in
build_packages_local.sh(-DCMAKE_INSTALL_RPATH=…), so plaincmake -DBUILD_RELOCATABLE_PACKAGE=ON ..produced a non-relocatable binary. Defaults now live inside theif(BUILD_RELOCATABLE_PACKAGE)block inCMakeLists.txt. The script drops the three-DCMAKE_*_RPATHflags. The list is also extended to match RVS:Gap 2 — Auto-computed patch version
VERSION_STRINGpatch is now derived fromgit describe --tags --abbrev=0 --match v<MAJOR>.<MINOR>.*followed bygit rev-list --count <tag>..HEAD. Falls back to a hardcoded patch (02) when git is unavailable or no matching tag exists. No user-visible change today since nov1.66.*tag exists yet (latest v-tag isv1.62.00); once av1.66.00tag is created, builds will be1.66.<commits-since>.Gap 3 — RVS-style package release tag
build_packages_local.shnow emits release strings matching the RVS format:r<libpatch>.<yyyymmdd>(e.g.r0711.20260504)r<libpatch>.<yyyymmdd>.<src-branch>.<commit>release/*branch${GITHUB_RUN_NUMBER}(fallback1)UTC date captured once at script start so long builds can't straddle midnight. The over-permissive bare
rel*branch matcher is dropped (workflow only triggersrelease/**).Gap 4 — User-facing TGZ install doc
TGZ install instructions move out of the CI-engineer-facing
README_BUILD_PACKAGES.mdinto a new Sphinx doc atdocs/install/INSTALL_TGZ.rst, covering pre-install ROCm, runtime deps (numactl,libnuma1,hsa-rocr— no PCI deps since TransferBench doesn't enumerate via PCI), extraction,PATH/LD_LIBRARY_PATH, and persistent/etc/profile.dsetup.install.rstcross-links to it.README_BUILD_PACKAGES.mdkeeps a minimal CI-maintainer smoke test plus a link to the new doc.Test plan
cmake -B build_verify -DBUILD_RELOCATABLE_PACKAGE=ON -DROCM_MAJOR_VERSION=7 …prints-- TransferBench version: 1.66.02(correct fallback)r0711.20260504,42,r0711.20260504.feature.foo.<commit>respectivelybuild-relocatable-packages.yml; verify package filenames carry the newr<libpatch>.<yyyymmdd>release stringreadelf -don the built binary shows the extended RPATH list and no$HOME/rocm-sdk/installleakpython3 -m sphinx -W -b html …) and confirmINSTALL_TGZ.htmlbuilds without warnings