Skip to content

ci(canary): add kind-based helm chart smoke test to Release Canary#1336

Draft
TaylorMutch wants to merge 2 commits into
mainfrom
tmutch/helm-release-canary
Draft

ci(canary): add kind-based helm chart smoke test to Release Canary#1336
TaylorMutch wants to merge 2 commits into
mainfrom
tmutch/helm-release-canary

Conversation

@TaylorMutch
Copy link
Copy Markdown
Collaborator

Summary

Adds a kubernetes job to the Release Canary workflow that installs the published 0.0.0-dev Helm chart into a kind cluster, port-forwards the gateway, registers it with the released CLI, and runs openshell status end-to-end. Pairs the workflow with a test-release-canary skill so future iteration on the canary is documented in one place.

Related Issue

N/A — no tracking issue. Originated from a request to extend the existing macOS / Ubuntu / Fedora canaries with Helm chart coverage.

Changes

Commit 1 — ci(canary): add kind-based helm chart smoke test to Release Canary

  • .github/workflows/release-canary.yml: new kubernetes job. Provisions kind via helm/kind-action@v1, helm install oci://ghcr.io/nvidia/openshell/helm-chart --version 0.0.0-dev with TLS/auth/pkiInitJob disabled, waits for the gateway pod to be Ready, port-forwards the service to 127.0.0.1:8080, installs the OpenShell CLI via install.sh, registers the in-cluster gateway as kind, and runs openshell status. Dumps helm/kubectl/port-forward diagnostics on failure.
  • .agents/skills/test-release-canary/SKILL.md: new skill covering manual dispatch (gh workflow run release-canary.yml --ref <branch>), the 0.0.0-dev vs 0.0.0-dev.<sha> chart pinning model, local kind reproduction, and per-job failure triage.
  • CI.md: new "Release workflows" section listing release-dev.yml, release-tag.yml, and release-canary.yml, with a pointer to the skill.
  • CONTRIBUTING.md: adds test-release-canary to the Reviewing skills row.

Commit 2 — chore(agents): sync skills table and architecture inventory

Pre-existing drift uncovered by sync-agent-infra, unrelated to the canary work but kept in the same PR to land the inventories cleanly:

  • CONTRIBUTING.md: adds fix-security-issue (Reviewing) and helm-dev-environment (Getting Started) rows.
  • AGENTS.md: adds openshell-driver-podman, openshell-prover, and openshell-vfio rows to the architecture overview.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

The canary itself is the test — it will fire automatically against the next Release Dev run after merge. To validate the workflow change before merge, dispatch it manually on this branch:

gh workflow run release-canary.yml --ref tmutch/helm-release-canary

Note: a branch dispatch tests the workflow definition from this branch against main's published 0.0.0-dev chart, which is what we want for iterating on the canary itself. See .agents/skills/test-release-canary/SKILL.md for the full playbook.

Local reproduction of the kubernetes job:

kind create cluster --name release-canary-local
helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart \
  --version 0.0.0-dev \
  --namespace openshell --create-namespace \
  --set server.disableTls=true \
  --set server.disableGatewayAuth=true \
  --set pkiInitJob.enabled=false \
  --wait --timeout 5m
kubectl wait --namespace openshell \
  --for=condition=Ready pod \
  --selector="app.kubernetes.io/name=openshell,app.kubernetes.io/instance=openshell" \
  --timeout=300s
kubectl port-forward --namespace openshell svc/openshell 8080:8080 &
openshell gateway add http://127.0.0.1:8080 --local --name kind
openshell status

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable) — N/A: AGENTS.md inventory updated; no subsystem boundary changes warrant architecture/*.md updates.

Add a kubernetes job to release-canary.yml that creates a kind
cluster, installs the published 0.0.0-dev helm chart, port-forwards
the gateway, registers it via openshell gateway add, and runs
openshell status against it. Pair the workflow with a
test-release-canary skill that documents manual dispatch, local
reproduction, and failure-mode triage; cross-reference it from
CI.md and the CONTRIBUTING.md skills table.

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
CONTRIBUTING.md was missing fix-security-issue and helm-dev-environment
from the skills table. AGENTS.md was missing openshell-driver-podman,
openshell-prover, and openshell-vfio from the architecture overview.
Add them so the documented inventories match what is on disk.

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant