Skip to content

feat(sandbox): add resource JSON passthrough#1340

Open
ryana wants to merge 2 commits into
mainfrom
1338-gpu-count/ra
Open

feat(sandbox): add resource JSON passthrough#1340
ryana wants to merge 2 commits into
mainfrom
1338-gpu-count/ra

Conversation

@ryana
Copy link
Copy Markdown
Collaborator

@ryana ryana commented May 12, 2026

Summary

Expose generic sandbox template resource passthrough in openshell sandbox create with --resources-json <JSON>. The CLI parses object JSON into SandboxTemplate.resources, so callers can set CPU, memory, and extended resources without adding one flag per resource.

--gpu-count COUNT remains as a convenience flag. It now injects limits["nvidia.com/gpu"] = "<COUNT>" into the same resource struct while still marking the sandbox as GPU-intent.

Related Issue

Closes #1338

Changes

  • Added --resources-json JSON to openshell sandbox create.
  • Validates bad CLI input loudly: empty --resources-json, invalid JSON, non-object JSON, and --gpu-count 0.
  • Reworked --gpu-count so it no longer adds public or driver protobuf fields; it writes nvidia.com/gpu into SandboxTemplate.resources.
  • Ensures resource-only sandbox creates still include a SandboxTemplate, even when no image is provided.
  • Updates Kubernetes pod rendering to preserve explicit GPU resource limits instead of overwriting them with the --gpu default.
  • Updates sandbox, Kubernetes driver, Kubernetes setup, and architecture docs.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (not applicable; this is covered by CLI request-shape tests and Kubernetes pod-template rendering tests)

Additional validation:

  • cargo test -p openshell-cli parse_sandbox_resources_json -- --nocapture
  • cargo test -p openshell-cli sandbox_create_resources_json -- --nocapture
  • cargo test -p openshell-cli sandbox_create_gpu_count -- --nocapture
  • cargo test -p openshell-driver-kubernetes gpu_sandbox -- --nocapture
  • cargo test -p openshell-server build_platform_config -- --nocapture

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Signed-off-by: Ryan Angilly <rangilly@nvidia.com>
@ryana ryana requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 12, 2026 21:20
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@github-actions
Copy link
Copy Markdown

@ryana
Copy link
Copy Markdown
Collaborator Author

ryana commented May 12, 2026

I have read the DCO document and I hereby sign the DCO.

@ryana
Copy link
Copy Markdown
Collaborator Author

ryana commented May 12, 2026

recheck

@drew drew requested a review from elezar May 12, 2026 23:36
Signed-off-by: Ryan Angilly <rangilly@nvidia.com>
@ryana ryana changed the title feat(sandbox): add GPU count support feat(sandbox): add resource JSON passthrough May 13, 2026
Comment on lines 1556 to 1559
gpu: bool,
gpu_count: Option<u32>,
resources_json: Option<&str>,
gpu_device: Option<&str>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that we would add a ResourcesSpec type that would include all of these instead of continuously extending the list. This spec would include specific fields as well as opague config to be interpreted by the driver (a key-value map). I have some local changes to this effect, but haven't found the "correct" shape yet. Pulling in your requirements may help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add GPU count support for Kubernetes sandboxes

2 participants