From 1aafb69f66b74cfb60a10dde253a7ad6c2932c68 Mon Sep 17 00:00:00 2001
From: Evan Lezar <elezar@nvidia.com>
Date: Wed, 13 May 2026 16:18:15 +0200
Subject: [PATCH] docs(rfc): add sandbox resource requirements proposal

Signed-off-by: Evan Lezar <elezar@nvidia.com>
---
 .../README.md                                 | 779 ++++++++++++++++++
 1 file changed, 779 insertions(+)
 create mode 100644 rfc/0004-sandbox-resource-requirements/README.md

diff --git a/rfc/0004-sandbox-resource-requirements/README.md b/rfc/0004-sandbox-resource-requirements/README.md
new file mode 100644
index 000000000..f95c2fa7f
--- /dev/null
+++ b/rfc/0004-sandbox-resource-requirements/README.md
@@ -0,0 +1,779 @@
+---
+authors:
+  - "@elezar"
+state: draft
+links:
+  - https://github.com/NVIDIA/OpenShell/issues/1338
+  - https://github.com/NVIDIA/OpenShell/pull/1340
+---
+
+# RFC 0004 - Sandbox Resource Requirements
+
+## Summary
+
+This RFC proposes replacing GPU-specific sandbox request fields with typed
+resource requirements on `SandboxSpec`. Resource requirements describe portable
+workload needs that influence driver selection and provisioning:
+
+- **compute** requirements for CPU and memory.
+- **device** requirements for GPUs and other accelerator-like resources.
+- future typed domains such as datasets when their semantics are defined.
+
+The gateway uses resource requirements to prefilter configured compute drivers,
+then relies on the selected driver to validate and provision the request.
+`SandboxTemplate.resources` remains a platform-native realization layer and
+escape hatch. It is not the portable driver-selection interface.
+
+## Motivation
+
+OpenShell currently treats GPU placement as a special case. The public
+`SandboxSpec` and internal `DriverSandboxSpec` both expose `gpu` and
+`gpu_device`, while driver capability discovery reports only `supports_gpu` and
+`gpu_count`. That is too narrow:
+
+- GPU identifiers are driver-specific. Docker and Podman use CDI device names,
+  while the VM driver supports device IDs by PCI BDF or index.
+- Count-based placement and exact device selection are different allocation
+  modes and should not be overloaded into one field.
+- CPU and memory are common portable requirements, but today callers must use
+  backend-shaped template resource passthrough for the public API path.
+- The gateway needs a portable way to decide which configured driver can serve
+  a sandbox request.
+- Future resources, such as datasets, should not require another ad hoc field
+  on `SandboxSpec`.
+
+Issue #1338 identified a real user need: Kubernetes users need to request more
+than one GPU. PR #1340 solves that immediate need by passing resource JSON into
+`SandboxTemplate.resources` and making `--gpu-count` inject an
+`nvidia.com/gpu` limit. This RFC intentionally supersedes that as the long-term
+API direction. Kubernetes resource limits are a valid driver realization, but
+portable GPU count belongs in typed resource requirements. JSON passthrough, if
+exposed by the CLI, should be named and documented as driver-specific
+configuration rather than portable resources.
+
+The proposal is inspired by Kubernetes Dynamic Resource Allocation structured
+parameters: scheduler-visible selection is structured, while driver-specific
+configuration remains separate and is interpreted by the resource driver.
+
+## Non-goals
+
+- Defining dataset allocation, mount, caching, or access-control semantics.
+  Datasets are only a motivating future domain in this RFC.
+- Building a gateway-level scheduler or reservation system.
+- Exposing detailed per-device inventory from drivers.
+- Exposing JSON-formatted portable resource requests in the CLI.
+- Publishing allocated resource identities in sandbox status.
+- Preserving long-term compatibility for `gpu`, `gpu_device`, or a
+  GPU-specific `gpu_count` request field.
+
+## Proposal
+
+### Public request model
+
+Add resource requirements to `SandboxSpec` and remove the GPU-specific scalar
+fields from the desired request model.
+
+```proto
+message SandboxSpec {
+  string log_level = 1;
+  map<string, string> environment = 5;
+  SandboxTemplate template = 6;
+  openshell.sandbox.v1.SandboxPolicy policy = 7;
+  repeated string providers = 8;
+
+  // Portable resource requirements used by the gateway for driver selection
+  // and by drivers for provisioning.
+  SandboxResourceRequirements resource_requirements = 11;
+
+  reserved 9, 10;
+  reserved "gpu", "gpu_device";
+}
+```
+
+`SandboxTemplate.resources` keeps its existing role as platform-native workload
+configuration. It may contain Kubernetes-style CPU, memory, and extended
+resource requests and limits, but it is not the portable resource contract.
+
+The CLI should not expose a JSON flag for `resource_requirements`. Common
+portable requests should use typed flags such as CPU, memory, and GPU-count
+flags, and SDK/API callers should use the typed protobuf messages directly.
+JSON-formatted CLI input is reserved for driver-specific configuration.
+
+The long-term public API should expose one opaque driver configuration field on
+the template instead of adding more JSON-shaped resource inputs:
+
+```proto
+message SandboxTemplate {
+  string image = 1;
+  string runtime_class_name = 2;
+  string agent_socket = 3;
+  map<string, string> labels = 4;
+  map<string, string> annotations = 5;
+  map<string, string> environment = 6;
+
+  // Driver/platform-native configuration. Top-level keys are DNS-style
+  // namespaces such as kubernetes.openshell.ai, docker.openshell.ai, or
+  // vm.openshell.ai. The gateway validates shape and size but does not
+  // interpret the values for portable resource matching.
+  map<string, google.protobuf.Struct> driver_config = 11;
+}
+```
+
+Existing JSON passthrough such as PR #1340's `--resources-json` should be
+renamed to `--driver-config-json` before it becomes a stable CLI contract, or
+treated as transitional if it has already shipped. The flag writes the provided
+object into `SandboxTemplate.driver_config`; it does not populate
+`SandboxSpec.resource_requirements`.
+
+### Resource requirements
+
+Use typed requirement domains for stable first-party resource concepts instead
+of making every request stringly typed through a `kind` field.
+
+```proto
+message SandboxResourceRequirements {
+  // Fungible scalar workload requirements.
+  ComputeResourceRequirements compute = 1;
+
+  // Accelerator-like resources such as GPUs and MIG slices.
+  repeated DeviceResourceRequirement devices = 2;
+
+  // Future typed domain. Semantics are intentionally not defined in this RFC.
+  repeated DatasetResourceRequirement datasets = 3;
+
+  // Escape hatch for third-party or experimental resource domains.
+  repeated GenericResourceRequirement extensions = 100;
+}
+
+message ComputeResourceRequirements {
+  // Values use Kubernetes-style quantity strings because they are familiar and
+  // already used by the driver resource model.
+  string cpu_request = 1;
+  string cpu_limit = 2;
+  string memory_request = 3;
+  string memory_limit = 4;
+}
+
+message DeviceResourceRequirement {
+  // Optional local name for error messages and future status correlation.
+  string name = 1;
+
+  // Portable device class requested by the workload, such as "gpu",
+  // "nvidia-gpu", or a future OpenShell-defined class name.
+  string class_name = 2;
+
+  // Number of devices in the class requested. Must be greater than zero.
+  uint32 count = 3;
+
+  // Portable labels or attributes the selected device must satisfy.
+  ResourceSelector selector = 4;
+
+  // Namespaced parameter blocks. The gateway may use namespace support for
+  // prefiltering, but only drivers interpret the parameter values.
+  repeated ResourceParameterBlock parameters = 5;
+}
+
+message ResourceSelector {
+  // Exact-match portable attributes such as vendor=nvidia.
+  map<string, string> match_attributes = 1;
+}
+
+message ResourceParameterBlock {
+  // DNS-style parameter namespace, such as cdi.openshell.ai.
+  string namespace = 1;
+  google.protobuf.Struct parameters = 2;
+}
+
+message DatasetResourceRequirement {
+  string name = 1;
+  string class_name = 2;
+  ResourceSelector selector = 3;
+  repeated ResourceParameterBlock parameters = 4;
+}
+
+message GenericResourceRequirement {
+  string kind = 1;
+  string name = 2;
+  uint32 count = 3;
+  ResourceSelector selector = 4;
+  repeated ResourceParameterBlock parameters = 5;
+}
+```
+
+The gateway validates the portable envelope:
+
+- compute quantities must be syntactically valid quantity strings.
+- device `class_name` must be non-empty.
+- device `count` must be greater than zero.
+- parameter namespace keys must be DNS-style names.
+- parameter values must fit existing request-size limits.
+
+The gateway does not interpret parameter values. A driver must reject a request
+that contains a parameter namespace it does not support, and the gateway may
+prefilter candidates using the same namespace support.
+
+### Compute requirements
+
+Compute requirements are fungible CPU and memory requirements. They differ from
+devices because they usually do not need exact identity or driver-specific
+selection.
+
+Example request:
+
+```yaml
+resourceRequirements:
+  compute:
+    cpuRequest: "2"
+    cpuLimit: "4"
+    memoryRequest: 4Gi
+    memoryLimit: 8Gi
+```
+
+Example realizations:
+
+| Driver | Realization |
+|---|---|
+| Kubernetes | Populate pod container `resources.requests.cpu`, `resources.limits.cpu`, `resources.requests.memory`, and `resources.limits.memory`. |
+| Docker | Apply supported runtime limits such as CPU quota/NanoCPUs and memory limit. Requests are capacity checks when the driver can evaluate host capacity. |
+| Podman | Apply supported runtime limits such as CPU quota and memory limit. Requests are capacity checks when the driver can evaluate host capacity. |
+| VM | Map CPU and memory limits to VM vCPU count and guest memory allocation. The driver may require request and limit to be equal when it cannot represent separate request/limit semantics. |
+
+Drivers must reject compute requirements they cannot honor. They must not
+silently accept a limit or request that has no effect.
+
+### Device requirements
+
+Device requirements cover GPUs and other accelerator-like resources. The first
+standard device class is `gpu`.
+
+Portable GPU semantics are limited to:
+
+- `class_name`
+- `count`
+- exact-match attributes in `selector.match_attributes`
+
+Driver-native GPU details are expressed through namespaced parameters. Example
+parameter namespaces:
+
+| Namespace | Intended drivers | Example fields |
+|---|---|---|
+| `cdi.openshell.ai` | Docker, Podman | `deviceId: "nvidia.com/gpu=all"` |
+| `kubernetes.openshell.ai` | Kubernetes | `resourceName: "nvidia.com/gpu"`, `resourceClassName: "nvidia-gpu"` |
+| `vm.openshell.ai` | VM | `deviceId: "0000:2d:00.0"`, `deviceIdType: "bdf"` |
+
+Example request for any NVIDIA GPU:
+
+```yaml
+resourceRequirements:
+  devices:
+    - name: gpu
+      className: gpu
+      count: 1
+      selector:
+        matchAttributes:
+          vendor: nvidia
+```
+
+Example request for four GPUs. A Kubernetes driver may realize this as
+`limits["nvidia.com/gpu"] = "4"`, but the public request stays portable:
+
+```yaml
+resourceRequirements:
+  devices:
+    - name: training-gpus
+      className: gpu
+      count: 4
+```
+
+Example request for a CDI GPU supported by Docker or Podman:
+
+```yaml
+resourceRequirements:
+  devices:
+    - name: gpu
+      className: gpu
+      count: 1
+      parameters:
+        - namespace: cdi.openshell.ai
+          parameters:
+            deviceId: nvidia.com/gpu=all
+```
+
+Example request for a VM GPU by BDF:
+
+```yaml
+resourceRequirements:
+  devices:
+    - name: gpu
+      className: gpu
+      count: 1
+      parameters:
+        - namespace: vm.openshell.ai
+          parameters:
+            deviceId: "0000:2d:00.0"
+            deviceIdType: bdf
+```
+
+Example realizations:
+
+| Driver | Realization |
+|---|---|
+| Kubernetes | Convert `className=gpu,count=N` into a pod resource limit such as `limits["nvidia.com/gpu"] = "N"` unless Kubernetes-specific parameters select another resource name or class. |
+| Docker | Convert CDI parameters into Docker CDI device injection. For a count-only request, select an available CDI GPU device when device inventory is available. |
+| Podman | Convert CDI parameters into Podman CDI device injection. For a count-only request, select an available CDI GPU device when device inventory is available. |
+| VM | Convert VM parameters into BDF or index-based device assignment. |
+
+Docker and Podman should not interpret VM BDF/index parameters. The VM driver
+should not interpret CDI parameters. Gateway namespace prefiltering should avoid
+sending clearly incompatible requests to those drivers.
+
+### Combined examples
+
+CPU, memory, and one GPU:
+
+```yaml
+resourceRequirements:
+  compute:
+    cpuRequest: "4"
+    cpuLimit: "8"
+    memoryRequest: 16Gi
+    memoryLimit: 32Gi
+  devices:
+    - name: gpu
+      className: gpu
+      count: 1
+      selector:
+        matchAttributes:
+          vendor: nvidia
+```
+
+Kubernetes realization:
+
+```yaml
+resources:
+  requests:
+    cpu: "4"
+    memory: 16Gi
+  limits:
+    cpu: "8"
+    memory: 32Gi
+    nvidia.com/gpu: "1"
+```
+
+Docker or Podman realization:
+
+```text
+runtime CPU/memory limits derived from compute limits
+CDI device injection derived from the selected gpu device requirement
+```
+
+VM realization:
+
+```text
+VM vCPU count and memory allocation derived from compute limits
+GPU passthrough derived from vm.openshell.ai parameters when present
+```
+
+### Specific realizations
+
+These examples show how the same portable request is compiled after a driver is
+selected. The exact serialized platform payload remains driver-owned; these are
+the intended effects.
+
+#### Kubernetes CPU and memory
+
+Input:
+
+```yaml
+resourceRequirements:
+  compute:
+    cpuRequest: "2"
+    cpuLimit: "4"
+    memoryRequest: 4Gi
+    memoryLimit: 8Gi
+```
+
+Kubernetes pod container resources:
+
+```yaml
+resources:
+  requests:
+    cpu: "2"
+    memory: 4Gi
+  limits:
+    cpu: "4"
+    memory: 8Gi
+```
+
+#### Kubernetes multi-GPU
+
+Input:
+
+```yaml
+resourceRequirements:
+  devices:
+    - name: training-gpus
+      className: gpu
+      count: 4
+```
+
+Kubernetes pod container resources:
+
+```yaml
+resources:
+  limits:
+    nvidia.com/gpu: "4"
+```
+
+If `kubernetes.openshell.ai.resourceName` is provided, the driver uses that
+resource name instead of `nvidia.com/gpu`.
+
+#### Docker or Podman CDI GPU
+
+Input:
+
+```yaml
+resourceRequirements:
+  devices:
+    - name: gpu
+      className: gpu
+      count: 1
+      parameters:
+        - namespace: cdi.openshell.ai
+          parameters:
+            deviceId: nvidia.com/gpu=0
+```
+
+Docker or Podman runtime request:
+
+```text
+--device nvidia.com/gpu=0
+```
+
+The gateway can prefilter this request to drivers that advertise the
+`cdi.openshell.ai` parameter namespace for the `gpu` device class.
+
+#### VM GPU by BDF
+
+Input:
+
+```yaml
+resourceRequirements:
+  devices:
+    - name: gpu
+      className: gpu
+      count: 1
+      parameters:
+        - namespace: vm.openshell.ai
+          parameters:
+            deviceId: "0000:2d:00.0"
+            deviceIdType: bdf
+```
+
+VM driver realization:
+
+```text
+attach host PCI device 0000:2d:00.0 to the sandbox VM
+```
+
+The gateway can prefilter this request to VM-like drivers that advertise the
+`vm.openshell.ai` parameter namespace for the `gpu` device class.
+
+#### Conflicting portable and template resources
+
+Input:
+
+```yaml
+resourceRequirements:
+  devices:
+    - name: gpu
+      className: gpu
+      count: 4
+template:
+  resources:
+    limits:
+      nvidia.com/gpu: "1"
+```
+
+Result:
+
+```text
+validation failure: portable GPU count conflicts with template GPU limit
+```
+
+The request must fail rather than letting either source silently override the
+other.
+
+### Driver-specific JSON configuration
+
+The only JSON-formatted CLI escape hatch proposed by this RFC is
+`--driver-config-json`. It is for driver/platform-specific configuration, not
+portable resource requests.
+
+Example:
+
+```shell
+openshell sandbox create \
+  --driver-config-json '{"kubernetes.openshell.ai":{"nodeSelector":{"accelerator":"nvidia"}}}'
+```
+
+Request shape:
+
+```yaml
+template:
+  driverConfig:
+    kubernetes.openshell.ai:
+      nodeSelector:
+        accelerator: nvidia
+```
+
+The gateway may use the presence of a driver-config namespace as a compatibility
+prefilter, but it must not inspect driver-config values as portable resource
+requirements. CPU, memory, GPU count, and exact GPU selection should use typed
+resource fields or typed CLI flags.
+
+This RFC does not introduce `--resources-json`, `--resource-requirements-json`,
+or `--template-resources-json`. If a driver needs backend-native resource
+settings that are not modeled by `resource_requirements`, they belong under
+that driver's `--driver-config-json` namespace and remain driver-specific.
+
+### Template realization and conflicts
+
+Drivers compile resource requirements into their native realization model:
+template resources, runtime device injection, VM device assignment, or platform
+config.
+
+`SandboxTemplate.resources` remains available for platform-native workload
+settings. Those settings are applied after driver selection and must not be
+used as the portable matching signal.
+
+If resource requirements and template resources express incompatible demands
+for the same resource, validation must fail loudly. For example, a sandbox that
+requests `className=gpu,count=4` while also setting
+`template.resources.limits["nvidia.com/gpu"] = "1"` is invalid. Drivers must
+not silently override portable resource intent with template passthrough values,
+or template passthrough values with portable resource intent.
+
+Requests with only `SandboxTemplate.resources` are valid platform-native
+passthrough, but they do not participate in portable driver matching.
+
+In the breaking API proposed by this RFC, new driver-native JSON should use
+`SandboxTemplate.driver_config`. Existing `SandboxTemplate.resources` behavior
+can be preserved during migration, but should not gain a stable CLI flag named
+`--resources-json` because that name conflicts with portable resource
+requirements.
+
+### Driver request model
+
+The internal compute-driver API mirrors the public resource request shape
+without importing the public API types. `DriverSandboxSpec` receives translated
+driver-owned resource requirements and drops `gpu` and `gpu_device`.
+
+```proto
+message DriverSandboxSpec {
+  string log_level = 1;
+  map<string, string> environment = 5;
+  DriverSandboxTemplate template = 6;
+  DriverSandboxResourceRequirements resource_requirements = 11;
+
+  reserved 9, 10;
+  reserved "gpu", "gpu_device";
+}
+```
+
+Driver-owned resource requirement messages should have the same semantics as
+the public messages, but live in `compute_driver.proto` to keep the public and
+internal contracts separated.
+
+### Driver capabilities
+
+Replace GPU-specific capability fields with coarse resource capability
+summaries:
+
+```proto
+message GetCapabilitiesResponse {
+  string driver_name = 1;
+  string driver_version = 2;
+  string default_image = 3;
+  DriverResourceCapabilities resource_capabilities = 6;
+
+  reserved 4, 5;
+  reserved "supports_gpu", "gpu_count";
+}
+
+message DriverResourceCapabilities {
+  ComputeResourceCapability compute = 1;
+  repeated DeviceClassCapability device_classes = 2;
+  repeated GenericResourceCapability extensions = 100;
+}
+
+message ComputeResourceCapability {
+  bool supports_cpu_request = 1;
+  bool supports_cpu_limit = 2;
+  bool supports_memory_request = 3;
+  bool supports_memory_limit = 4;
+}
+
+message DeviceClassCapability {
+  string class_name = 1;
+
+  // Omitted when the driver cannot cheaply or accurately report availability.
+  optional uint32 allocatable_count = 2;
+
+  // Portable attributes this driver may use for prefiltering. This is a
+  // summary, not a per-device inventory.
+  map<string, string> attributes = 3;
+
+  // Parameter namespaces the driver understands for this device class.
+  repeated string parameter_namespaces = 4;
+}
+```
+
+Capabilities are advisory. They allow the gateway to reject clearly impossible
+requests and choose a likely driver, but they are not a reservation.
+
+### Gateway matching
+
+The gateway should evaluate configured compute drivers in a deterministic
+order. The default order is the order in gateway configuration.
+
+For a sandbox create request:
+
+1. Load or refresh driver capabilities.
+2. Keep candidates that support the requested compute fields.
+3. Keep candidates that support every requested device class.
+4. Reject candidates whose known `allocatable_count` is lower than the
+   requested device count.
+5. Reject candidates that do not advertise every parameter namespace present in
+   the request for that device class.
+6. Apply portable selector prefiltering only when the driver advertises matching
+   attributes. Absence of an advertised attribute should not be treated as a
+   match.
+7. Call `ValidateSandboxCreate` on remaining candidates in deterministic order.
+8. Select the first driver that validates the request.
+9. Return a user-facing error containing summarized validation failures if no
+   driver can serve the request.
+
+The selected driver's `CreateSandbox` call remains the final authority. A
+request that passes gateway prefiltering can still fail if resources disappear
+or if driver-specific validation rejects parameter values.
+
+When no resource requirements are present, the gateway should preserve today's
+default behavior and use the configured default driver.
+
+## Implementation plan
+
+1. Update public protobufs to add `SandboxResourceRequirements` and remove the
+   long-term use of `gpu` and `gpu_device`.
+2. Update compute-driver protobufs with mirrored driver-owned resource
+   requirements and coarse resource capability summaries.
+3. Update gateway validation and public-to-driver translation.
+4. Add validation that rejects conflicts between portable resource requirements
+   and template resource passthrough.
+5. Allow the gateway to consider multiple configured compute drivers for a
+   create request, using capability prefiltering plus `ValidateSandboxCreate`.
+6. Update Kubernetes, Docker, Podman, and VM drivers to advertise compute and
+   GPU device capability summaries and interpret their supported parameter
+   namespaces.
+7. Update CLI/API request construction so CPU, memory, GPU count, and exact GPU
+   selection use resource requirements instead of GPU-specific request fields.
+8. Rename JSON passthrough to `--driver-config-json` and map it to
+   `SandboxTemplate.driver_config`; do not expose JSON-formatted portable
+   resource request flags.
+9. Update user-facing docs and driver README files once behavior is
+   implemented.
+
+Because this is a breaking request-spec change, the implementation must either
+land in a breaking API version or be explicitly called out as a breaking change
+for the current API. Removed protobuf tags should be reserved rather than
+reused.
+
+## Tests
+
+The implementation should include:
+
+- protobuf translation tests for public resource requirements into driver
+  resource requirements.
+- gateway matching tests for compute capability support, device class, count,
+  selector, and parameter namespace filtering.
+- gateway tests showing that the selected driver is the first validating
+  candidate in configured order.
+- validation tests for conflicts between resource requirements and template
+  resource passthrough.
+- validation tests that unsupported parameter namespaces are rejected.
+- Kubernetes tests that map compute requirements to pod CPU/memory resources
+  and GPU count to `nvidia.com/gpu` limits.
+- Docker and Podman GPU e2e tests that request a CDI GPU with
+  `cdi.openshell.ai`.
+- VM tests that map CPU/memory to VM allocation and request a GPU by BDF or
+  index with `vm.openshell.ai`.
+- tests showing that template-only resources are treated as platform-native
+  passthrough and are not used for portable driver matching.
+- CLI request-shape tests showing that `--driver-config-json` populates
+  `SandboxTemplate.driver_config` and that there is no JSON-formatted portable
+  resource request flag.
+- error-message tests for no matching driver and validation failure across all
+  candidates.
+
+## Risks
+
+- The typed model may still need adjustment when dataset semantics are fully
+  designed.
+- Coarse capabilities can be stale, so users may still see create-time failures
+  after gateway prefiltering succeeds.
+- A breaking API change affects CLI users, SDK users, and any direct gRPC
+  clients.
+- Namespaced parameters can fragment if drivers define overlapping ways to
+  express the same concept.
+- Supporting multiple configured compute drivers changes gateway assumptions
+  that currently require exactly one driver.
+- Existing template resource passthrough creates a second way to express some
+  platform-native requirements, so conflict validation and documentation need
+  to be clear.
+
+## Alternatives
+
+- Use `SandboxTemplate.resources` as the only resource request interface. This
+  works for Kubernetes-style CPU, memory, and extended resources, but it makes
+  portable driver selection depend on backend-shaped data.
+- Expose `--resources-json` as a CLI shortcut for `resource_requirements`. This
+  would avoid adding one flag per typed resource, but it weakens the CLI
+  contract and makes the portable resource model feel like another opaque
+  passthrough surface.
+- Expose `--resources-json` as a CLI shortcut for `SandboxTemplate.resources`.
+  This matches PR #1340's immediate implementation direction, but the name
+  implies portable resource semantics. `--driver-config-json` is more explicit
+  about the backend-native nature of the data.
+- Use a repeated `kind`-based requirement for all resources. This keeps gateway
+  matching generic, but makes common resources such as CPU, memory, and GPU more
+  stringly typed than necessary.
+- Keep `gpu`, `gpu_device`, and add `gpu_count`. This is simple for GPUs but
+  does not help CPU, memory, datasets, or other future resource kinds.
+- Make all resource metadata opaque to the gateway. This gives drivers maximum
+  flexibility but prevents meaningful gateway prefiltering.
+- Expose detailed per-device inventory from drivers. This would improve
+  matching precision but pushes the gateway toward scheduler and reservation
+  responsibilities that this RFC intentionally avoids.
+- Preserve GPU-specific fields and flags as compatibility shims. This reduces
+  migration friction but keeps two request paths for the same concept.
+
+## Prior art
+
+- Kubernetes Dynamic Resource Allocation separates scheduler-visible selection
+  from driver-owned resource parameters and allocation behavior.
+- Kubernetes extended resources provide a count-based model for devices such as
+  GPUs, but do not handle driver-specific parameterization by themselves.
+- Container Device Interface gives container runtimes a common way to name and
+  inject devices, but CDI names are still a container-runtime concern rather
+  than a portable OpenShell resource identifier.
+
+## Open questions
+
+- Should OpenShell define a registry of standard device classes and portable
+  selector attributes, or should that evolve informally as drivers add support?
+- Should allocated resource identities be exposed in sandbox status in a later
+  RFC?
+- Should parameter namespaces have published schemas, or should drivers own
+  validation and documentation independently?
+- Should gateway capability summaries be refreshed on every create request, on
+  a timer, or only when a driver reports a watch/event signal?