fix(ci): eliminate image-tag race between concurrent workflows#1413
fix(ci): eliminate image-tag race between concurrent workflows#1413mesutoezdil wants to merge 1 commit into
Conversation
Single-arch builds were collapsing onto the bare :SHA tag, letting Branch Kubernetes E2E, Branch E2E Checks, and GPU Test overwrite each other depending on which finished last. kind load then failed with "content digest not found" when the tag was a multi-platform manifest list but only one arch was present locally. Always append the arch suffix to IMAGE_TAG so each workflow writes to its own slot (:SHA-amd64, :SHA-arm64). Run the merge step unconditionally when pushing so the bare :SHA tag is always a deterministic manifest list produced by the merge job, not whichever single-arch build lands last. Update branch-kubernetes-e2e to pull the amd64-suffixed tag directly. Fixes NVIDIA#1343 Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
|
Thanks for taking this on. I do not think this closes #1343 yet. Blocking issues:
What I would change:
Net: the release pipeline looks okay, but this PR should be revised before merge because the CI race is not actually removed and K8s E2E has a tag mismatch. |
Summary
IMAGE_TAGindocker-build.ymlso each workflow writes to its own registry slot:SHAtag is always a deterministic manifest listbranch-kubernetes-e2e.ymlto pull theamd64-suffixed tag directlyRelated Issue
Fixes #1343
Changes
Three concurrent workflows (Branch Kubernetes E2E, Branch E2E Checks, GPU Test) were all writing to the same bare
:SHAtag. The merge step was gated onplatform_count != 1, so single-arch builds collapsed onto the bare tag and raced with each other.kind loadthen failed when it found a manifest list but only had one arch locally.The fix removes the conditional collapse so every build writes to
:SHA-<arch>, and the merge step always assembles the bare:SHAtag as a proper manifest list.Testing
Checklist