diff --git a/specs/api/ambient-model.spec.md b/specs/api/ambient-model.spec.md index e6c9d86c1..26ff7a428 100644 --- a/specs/api/ambient-model.spec.md +++ b/specs/api/ambient-model.spec.md @@ -2,8 +2,9 @@ **Date:** 2026-03-20 **Status:** Active -**Last Updated:** 2026-06-03 — added Application (GitOps continuous sync for agent fleets); addressed review feedback: credential_id FK for remote auth, RoleBinding escalation rules, prune safety, health status semantics, gitops role grantability, sync engine kind filtering -**Previous:** 2026-05-12 — migrate Credentials from project-scoped to global routes (`/credentials`); remove `project_id` from model, OpenAPI, and SDK; add drop-column migration; update coverage matrix +**Last Updated:** 2026-06-10 — added Project `session_admission` for declarative session scheduling/admission intent; Applications sync the Project intent but not scheduler infrastructure +**Previous:** 2026-06-03 — added Application (GitOps continuous sync for agent fleets); addressed review feedback: credential_id FK for remote auth, RoleBinding escalation rules, prune safety, health status semantics, gitops role grantability, sync engine kind filtering +**Earlier:** 2026-05-12 — migrate Credentials from project-scoped to global routes (`/credentials`); remove `project_id` from model, OpenAPI, and SDK; add drop-column migration; update coverage matrix **Workflow:** `../../workflows/sessions/ambient-model.workflow.md` — implementation waves, gap table, build commands, run log **Design:** `credentials-session.md` — full Credential Kind design spec and rationale @@ -49,6 +50,7 @@ erDiagram string name string description string prompt "workspace-level context injected into every agent start" + jsonb session_admission "nullable — project-level session admission intent" jsonb labels jsonb annotations string status @@ -132,6 +134,8 @@ erDiagram string labels "JSON map; queryable tags" string annotations "JSON map; freeform metadata" string phase + string admission_profile "nullable — resolved session admission profile" + string admission_queue "nullable — resolved local admission queue" time start_time time completion_time string kube_cr_name "Kubernetes CR / pod name (set to session ID on create)" @@ -139,7 +143,7 @@ erDiagram string kube_namespace string sdk_session_id int32 sdk_restart_count - string conditions + jsonb conditions "condition array with type, status, reason, message, last_transition_time" string reconciled_repos string reconciled_workflow time created_at @@ -231,6 +235,8 @@ erDiagram Application { string ID PK "KSUID" string name "unique; human-readable" + string created_by_user_id FK "creator captured for local sync authorization" + string sync_actor_user_id FK "nullable — override actor for local sync authorization" string source_repo_url "git repository URL" string source_target_revision "branch, tag, or commit SHA" string source_path "path within repo to kustomize overlay" @@ -314,7 +320,7 @@ An Application syncs **project-scoped fleet definitions** — a subset of resour | Kind | Sync Behavior | |---|---| -| `Project` | Created if `CreateProject=true` in `sync_options`; patched (description, prompt, labels, annotations) on subsequent syncs | +| `Project` | Created if `CreateProject=true` in `sync_options`; patched (description, prompt, session_admission, labels, annotations) on subsequent syncs | | `Agent` | Created or patched within the destination project; prompt, labels, annotations updated | | `Credential` | Created if not present; idempotent by name | | `RoleBinding` | Created if not present; idempotent by user+role+scope key. **Escalation-bound:** the sync engine can only create RoleBindings at or below the level of the service credential it uses (see Design Decisions). | @@ -329,16 +335,19 @@ An Application syncs **project-scoped fleet definitions** — a subset of resour | `ScheduledSession` | Project-scoped trigger config; future sync candidate. | | `User` | Identity record. | | `Role` | RBAC definition (platform-scoped, not project-scoped). | +| Kueue `ClusterQueue`, `LocalQueue`, `ResourceFlavor`, `Workload` | Scheduler infrastructure. Project manifests declare Ambient `session_admission` intent instead. | ### Field Reference | Field | Notes | |---|---| | `name` | Unique, human-readable. The stable address of this sync binding. | +| `created_by_user_id` | User who created the Application. For local Applications, this user is the default effective sync actor for authorization-sensitive writes. | +| `sync_actor_user_id` | Nullable FK to User. When set on a local Application, this user is the effective sync actor. The value can only be set to a subject the caller is allowed to delegate. | | `source_repo_url` | Git repository URL. HTTPS or SSH. | | `source_target_revision` | Branch name, tag, or commit SHA. Default: `main`. | | `source_path` | Relative path within the repo to a kustomize directory (must contain `kustomization.yaml`). | -| `credential_id` | Nullable FK → Credential. The stored credential providing authentication for the destination Ambient's REST API. Required when `destination_ambient_url` is set. Uses the same write-only encrypted storage as all Credentials. The credential's token is resolved at sync time via `GET /credentials/{cred_id}/token` (gated by `credential:token-reader`). Null when targeting the local Ambient (controller uses its own service identity). | +| `credential_id` | Nullable FK → Credential. The stored credential providing authentication for the destination Ambient's REST API. Required when `destination_ambient_url` is set. Uses the same write-only encrypted storage as all Credentials. The credential's token is resolved at sync time via `GET /credentials/{cred_id}/token` (gated by `credential:token-reader`). Null when targeting the local Ambient, where the sync controller executes locally but must still enforce authorization against the Application's effective sync actor. | | `destination_ambient_url` | Nullable. The Ambient API server URL to sync to. Null = local Ambient (this API server). When set, `credential_id` must also be set — async polling controllers have no request context to forward a token from. | | `destination_project` | Target project name. The project is created on first sync if `CreateProject=true` is in `sync_options`. | | `auto_sync` | If true, the controller polls the git repo and syncs automatically when changes are detected. If false, sync is manual via `POST /sync`. | @@ -372,7 +381,7 @@ For automated sync (`auto_sync=true`), this lifecycle runs on a configurable pol ``` Application.destination_ambient_url set? |── null ──> local Ambient (this API server's own service layer) - | ──> controller uses its own service identity + | ──> controller executes locally; authorization uses the effective sync actor |── set ──> remote Ambient (SDK client pointed at the URL) ──> credential_id MUST be set (FK → Credential) ──> token resolved at sync time via GET /credentials/{id}/token @@ -380,6 +389,8 @@ Application.destination_ambient_url set? When targeting a remote Ambient, the sync engine acts as an API client to the remote Ambient's REST API, authenticated via the stored Credential. The credential is resolved at sync time — the controller never caches tokens beyond a single sync cycle. This is different from how Sessions use kubeconfig for direct K8s provisioning — the Application works entirely at the Ambient API layer. +For local Applications, the controller's internal service identity is only an execution mechanism. Authorization-sensitive writes, including Project `session_admission`, are evaluated against the Application's effective sync actor and must not gain extra authority from the controller service identity. + ### Unsupported Kinds in Sync The kustomize rendering engine (`acpctl apply -k`) supports additional resource kinds beyond what Application syncs (e.g., `Cluster`, `Ambient` — infrastructure inventory kinds). When a rendered kustomize tree contains documents of unsupported kinds, the sync engine **silently skips** them. Each skipped document is recorded in `resource_status` with a `Skipped` status: @@ -430,6 +441,26 @@ Promotion is a git operation: merge the dev overlay changes into the release bra --- +## Project — Workspace And Session Admission Intent + +Project is the workspace boundary for Agents, Inbox messages, Sessions, and default runner admission policy. The stable address is `project_name`. + +| Field | Notes | +|-------|-------| +| `name` | Human-readable DNS-1123 project name. Also used as the Project ID. | +| `description` | Nullable. Free-text project description. | +| `prompt` | Workspace-level context injected into every Agent start in the Project. | +| `session_admission` | Nullable JSON object declaring Ambient session admission intent. Initial field: `profile`, a platform-defined profile name such as `standard`. Stored `null`, `{}`, or `{"profile": null}` inherits the platform default. | +| `labels` | JSON map for queryable project tags. | +| `annotations` | JSON map for freeform project metadata. | +| `status` | Nullable platform status string. | + +`session_admission` is an Ambient API abstraction. It does not embed Kueue `ClusterQueue`, `LocalQueue`, `ResourceFlavor`, `Workload`, or other scheduler resource definitions. The control plane resolves the Project's admission profile into platform-owned scheduler infrastructure. See [Kueue Session Admission](../control-plane/kueue-session-admission.spec.md). + +Declarative apply treats omitted `session_admission` as unmanaged. Applying `session_admission: null` or `session_admission: {}` clears the stored policy back to default inheritance. + +--- + ## Agent — Project-Scoped Mutable Definition Agent is scoped to a Project. The stable address is `{project_name}/{agent_name}`. @@ -450,7 +481,7 @@ Agent is scoped to a Project. The stable address is `{project_name}/{agent_name} | `bot_account_name` | Nullable. Service account name for git operations inside sessions. Copied to `Session.bot_account_name` on ignite. | | `resource_overrides` | Nullable. JSON-encoded pod resource requests/limits override for sessions spawned by this agent. Copied to `Session.resource_overrides` on ignite. | | `environment_variables` | Nullable. JSON-encoded extra environment variables injected into session pods. Copied to `Session.environment_variables` on ignite. | -| `current_session_id` | Denormalized FK to the active Session. Null when no session is running. Used by Project Home for fast reads. | +| `current_session_id` | Denormalized FK to the active Session. Null when the Agent has no non-terminal Session. Used by Project Home for fast reads. | **Agent is mutable.** PATCH updates in place. There is no versioning. If you need to track prompt history, use `labels`/`annotations` or an external audit log. @@ -476,7 +507,7 @@ Inbox messages are addressed to an Agent (`agent_id`). They are distinct from Se | Scope | Agent (persists across sessions) | Session (ephemeral) | | Created by | Human or another Agent | LLM turn / runner gRPC push | | Drained | At session start | Never — append-only stream | -| Purpose | Queued intent waiting for next run | Real LLM event stream | +| Purpose | Persistent intent waiting for next run | Real LLM event stream | At session start, all unread Inbox messages are drained: marked `read=true` and injected as context into the Session prompt before the first SessionMessage turn. @@ -488,6 +519,10 @@ Sessions are **not directly creatable**. They are run artifacts created exclusiv `Session.prompt` scopes the task for this specific run — separate from `Agent.prompt` which defines who the agent is. +Session phases are title-case values: `Pending`, `Creating`, `Queued`, `Running`, `Stopping`, `Stopped`, `Completed`, and `Failed`. Active phases are `Pending`, `Creating`, `Queued`, and `Running`; active Sessions block duplicate Agent starts and remain referenced by `Agent.current_session_id`. + +`Queued` means the Session has already been created by Agent start and is waiting for runner admission, scheduling, readiness, or reachability. Inbox messages are drained at Agent start time. Messages created while a Session is `Queued` remain unread for a future Agent start. + ``` Project.prompt → "This workspace builds the Ambient platform API server in Go." Agent.prompt → "You are a backend engineer specializing in Go APIs..." @@ -553,8 +588,8 @@ The `acpctl` CLI mirrors the API 1-for-1. Every REST operation has a correspondi |---|---|---| | `GET /projects` | `acpctl get projects` | ✅ implemented | | `GET /projects/{id}` | `acpctl get project ` | ✅ implemented | -| `POST /projects` | `acpctl create project --name [--description ]` | ✅ implemented | -| `PATCH /projects/{id}` | `acpctl project update [--name ] [--description ] [--prompt

]` | ✅ implemented | +| `POST /projects` | `acpctl create project --name [--description ] [--session-admission-profile

]` | ✅ existing command; admission flag 🔲 planned | +| `PATCH /projects/{id}` | `acpctl project update [--name ] [--description ] [--prompt

] [--session-admission-profile

]` | ✅ existing command; admission flag 🔲 planned | | `DELETE /projects/{id}` | `acpctl delete project ` | ✅ implemented | | _(context switch)_ | `acpctl project ` | ✅ implemented | | _(context view)_ | `acpctl project current` | ✅ implemented | @@ -680,15 +715,16 @@ The `acpctl` CLI mirrors the API 1-for-1. Every REST operation has a correspondi ### `acpctl apply` — Declarative Fleet Management -`acpctl apply` reconciles Projects and Agents from declarative YAML files, mirroring `kubectl apply` semantics. It is the primary way to provision and update entire agent fleets from the `.ambient/teams/` directory tree. +`acpctl apply` reconciles Projects, Agents, Credentials, and RoleBindings from declarative YAML files, mirroring `kubectl apply` semantics. It is the primary way to provision and update entire agent fleets from the `.ambient/teams/` directory tree. #### Supported Kinds | Kind | Fields applied | |---|---| -| `Project` | `name`, `description`, `prompt`, `labels`, `annotations` | +| `Project` | `name`, `description`, `prompt`, `session_admission`, `labels`, `annotations` | | `Agent` | `name`, `prompt`, `labels`, `annotations`, `inbox` (seed messages) | | `Credential` | `name`, `description`, `provider`, `token` (env var reference), `url`, `email`, `labels`, `annotations` — global resource; use `credential bind` to grant project access | +| `RoleBinding` | `role`, `scope`, `project_id`, `agent_id`, `session_id`, `credential_id`, `user_id` | `Agent` resources in `.ambient/teams/` files also carry an `inbox` list of seed messages. On apply, any message in the list is posted to the agent's inbox if an identical message (same `from_name` + `body`) does not already exist there. @@ -703,7 +739,7 @@ acpctl apply -f - # read from stdin Each file may contain one or more YAML documents separated by `---`. Documents with unrecognised `kind` values are skipped with a warning. Apply behaviour per resource: -- **Project**: if a project with `name` already exists, `PATCH` it (description, prompt, labels, annotations). If it does not exist, `POST` to create it. +- **Project**: if a project with `name` already exists, `PATCH` it (description, prompt, session_admission, labels, annotations). If it does not exist, `POST` to create it. - **Agent**: resolved within the current project context. If an agent with `name` already exists in the project, `PATCH` it (prompt, labels, annotations). If it does not exist, `POST` to create it. After upsert, post any inbox seed messages not already present. Output (default — one line per resource): @@ -866,7 +902,7 @@ GET /api/ambient/v1/projects/{id}/agents/{agent_id}/role_bindings RBAC bin "session": { "id": "2abc...", "agent_id": "1def...", - "phase": "pending", + "phase": "Pending", "created_by_user_id": "...", "created_at": "2026-03-20T00:00:00Z" }, @@ -1443,6 +1479,8 @@ This structure means you can define and compose bespoke agent suites — entire | `labels` / `annotations` are JSONB, not strings | Enables GIN-indexed key/value queries (`@>` operator) without joins; every row carries its own metadata without a separate EAV table. `labels` = queryable tags; `annotations` = freeform notes. Applied to first-class Kinds: User, Project, Agent, Session. Not applied to Inbox, SessionMessage, Role/RoleBinding. | | Credential is global, not project-scoped | Eliminates duplication when the same PAT is used across multiple Projects. Access controlled via RoleBindings with `credential` scope. A single Credential can be shared across Projects without creating copies. | | Application syncs fleet definitions, not infrastructure | Application syncs Projects, Agents, Credentials, RoleBindings, and Inbox seeds. Sessions, Users, and Roles are not synced. | +| Project `session_admission` syncs intent, not scheduler objects | Project manifests can declare the desired session admission profile. Applications sync that Project field through the Ambient API. Kueue queues, flavors, and workloads remain platform-owned scheduler infrastructure and are skipped as unsupported Application sync kinds. | +| `Queued` is a Session phase, not Inbox state | Inbox stores persistent intent before Agent start. `Queued` is a runtime Session phase after Agent start while the runner waits for admission or scheduling. | | Application targets Ambient API, not K8s API | Unlike Sessions (which use kubeconfig for direct K8s provisioning), Application works at the Ambient REST API layer. Remote sync uses the SDK client pointed at `destination_ambient_url`. | | Promotion via multiple Applications | Each environment gets its own Application pointing to a different git overlay and destination Ambient URL. Promotion = merge changes between overlay branches. | | Kustomize engine shared between CLI and API server | The sync engine reuses the same kustomize rendering logic as `acpctl apply -k`. | @@ -1523,12 +1561,14 @@ design rationale (storage, rotation, provider serialization, migration). ## Implementation Coverage Matrix -_Last updated: 2026-04-28. Use this as the authoritative index — click into component source to verify._ +_Last updated: 2026-06-10. Use this as the authoritative index — click into component source to verify._ | Area | API Server | Go SDK | CLI (`acpctl`) | Notes | |---|---|---|---|---| | **Sessions — CRUD** | ✅ | ✅ `SessionAPI.{Get,List,Create,Update,Delete}` | ✅ `get/create/delete session` | | | **Sessions — start/stop** | ✅ `/start` `/stop` | ✅ `SessionAPI.{Start,Stop}` | ✅ `start`/`stop` commands | | +| **Sessions — `Queued` phase** | 🔲 add validator, OpenAPI, gRPC, active queries, stop support | 🔲 Go/Python/TypeScript SDK phase helpers and generated models | 🔲 CLI/frontend/ambient-ui phase parsing, active checks, action gates | Required by Kueue Session Admission | +| **Sessions — admission status** | 🔲 add `admission_profile`, `admission_queue`, structured `conditions` migration | 🔲 Go/Python/TypeScript SDK fields and condition types | 🔲 describe/watch output for admission profile, queue, and conditions | Required by Kueue Session Admission | | **Sessions — messages (list/push/watch)** | ✅ `/messages` | ✅ `PushMessage`, `ListMessages`, `WatchSessionMessages` (gRPC) | ✅ `session messages`, `session send` | gRPC watch via `session_watch.go` | | **Session messages (top-level)** | ✅ `GET /session_messages` | ✅ `SessionMessages().List()` | n/a | SDK/CP-internal; used by CP to resolve max seq on restart | | **Sessions — live events (SSE proxy)** | ✅ `/events` → runner pod | ✅ `SessionAPI.StreamEvents` → `io.ReadCloser` | ✅ `session events` | Runner must be Running; 502 if unreachable | @@ -1546,6 +1586,7 @@ _Last updated: 2026-04-28. Use this as the authoritative index — click into co | **Inbox — list/send** | ✅ GET/POST `/inbox` | ✅ `InboxMessageAPI.{ListByAgent,Send}` + `ProjectAgentAPI.{ListInboxInProject,SendInboxInProject}` | ✅ `inbox list`, `inbox send` | | | **Inbox — mark-read/delete** | ✅ PATCH/DELETE `/inbox/{id}` | ✅ `InboxMessageAPI.{MarkRead,DeleteMessage}` | ✅ `inbox mark-read`, `inbox delete` | | | **Projects — CRUD** | ✅ | ✅ `ProjectAPI.{Get,List,Create,Update,Delete}` | ✅ `get/create/delete project`, `project set/current`, `project update` | | +| **Projects — session_admission** | 🔲 add DB field, REST, OpenAPI, gRPC, validation | 🔲 Go/Python/TypeScript SDK models/builders | 🔲 create/update/apply flags and drift semantics | Project-level admission intent; Applications sync this field | | **Projects — labels/annotations** | ✅ PATCH accepts `labels`/`annotations` | ✅ fields on `Project` type; `ProjectAPI.Update(patch map[string]any)` | ⚠️ no dedicated subcommand | | | **RBAC — roles** | ✅ full CRUD | ✅ `RoleAPI` | ✅ `create role`, `get roles`, `get roles `, `delete role` | | | **RBAC — role bindings** | ✅ full CRUD | ✅ `RoleBindingAPI` | ✅ `create role-binding`, `get role-bindings`, `get role-bindings `, `delete role-binding` | | @@ -1554,6 +1595,7 @@ _Last updated: 2026-04-28. Use this as the authoritative index — click into co | **Credentials — token fetch** | ✅ `GET /credentials/{cred_id}/token` | ✅ `GetToken()` in `credential_extensions.go` | ✅ `credential token ` | Gated by `credential:token-reader`; granted to runner SA by operator | | **ScheduledSessions — CRUD** | ✅ scheduledSessions plugin | ✅ `ScheduledSessionAPI.{List,Get,Create,Update,Delete,GetByName}` | ✅ `scheduled-session list/get/create/update/delete` | | | **ScheduledSessions — lifecycle** | ✅ suspend/resume/trigger/runs handlers | ✅ `ScheduledSessionAPI.{Suspend,Resume,Trigger,Runs}` | ✅ `scheduled-session suspend/resume/trigger/runs` | | +| **Applications — sync actor** | 🔲 add `created_by_user_id`, `sync_actor_user_id`, local sync authorization | 🔲 SDK fields when Applications are generated | 🔲 Application create/update/display commands | Required before local Application sync can apply `session_admission` safely | | **Generic proxy — project config** | ✅ proxy plugin (`plugins/proxy`); forwards non-`/api/ambient/` paths to `BACKEND_URL` | n/a | 🔲 raw HTTP fallback | Permissions, keys, MCP servers, secrets, feature flags | | **Generic proxy — repo operations** | ✅ proxy plugin | n/a | 🔲 raw HTTP fallback | Tree, blob, branches, seed, forks | | **Generic proxy — auth integrations** | ✅ proxy plugin | n/a | n/a | GitHub/GitLab/Google/Jira/Gerrit/CodeRabbit/MCP OAuth flows | diff --git a/specs/control-plane/kueue-session-admission.spec.md b/specs/control-plane/kueue-session-admission.spec.md new file mode 100644 index 000000000..85dc1f946 --- /dev/null +++ b/specs/control-plane/kueue-session-admission.spec.md @@ -0,0 +1,407 @@ +# Kueue Session Admission Specification + +## Purpose + +This spec defines how Ambient represents and controls session admission when runner capacity is mediated by Kubernetes scheduling and optional Kueue quotas. The user-facing contract is Ambient-native: Projects declare session admission intent, Sessions expose a first-class queued lifecycle phase, and Applications sync project intent through the Ambient API. Kueue objects remain platform scheduler infrastructure. + +Scheduler admission queueing is distinct from Inbox queueing. Inbox messages are persistent Agent intent waiting for the next run. Session admission queueing is a runtime state for an already-started Session waiting for runner capacity. + +## Requirements + +### Requirement: First-Class Queued Session Phase +The system SHALL support `Queued` as a persistent, title-case Session phase representing a Session that has been accepted by Ambient and reconciled to a runner admission Pod, but whose runner has not yet been admitted, scheduled, and made reachable. + +#### Scenario: Session Waits For Admission +- GIVEN a Session in `Pending` +- AND the control plane creates or updates the runner admission Pod +- WHEN the Pod is waiting for Kueue admission or Kubernetes scheduling capacity +- THEN the Session phase SHALL be `Queued` +- AND the Session SHALL include a condition explaining the queue or scheduling wait reason when the platform can observe one + +#### Scenario: Session Starts After Admission +- GIVEN a Session in `Queued` +- WHEN the runner admission Pod is admitted, scheduled, ready, and the runner is reachable +- THEN the Session phase SHALL become `Running` +- AND `start_time` SHALL be set if it was not already set + +#### Scenario: Queued Sessions Are Active +- GIVEN an Agent has a Session in `Queued` +- WHEN a user starts the same Agent again +- THEN the API SHALL return the existing active Session instead of creating a second Session + +#### Scenario: Stop Queued Session +- GIVEN a Session in `Queued` +- WHEN a user stops the Session +- THEN the Session SHALL transition through `Stopping` +- AND the control plane SHALL remove the runner admission Pod +- AND the Session SHALL become `Stopped` after the Pod is removed or no longer observable + +### Requirement: Session Phase Compatibility +The system SHALL preserve the existing Session phase contract while adding `Queued` to every phase validator, active-phase query, OpenAPI and gRPC schema, SDK phase helper or constant surface, CLI view, frontend view, behavioral phase allowlist, and watch/event consumer. + +#### Scenario: Existing Clients See Existing Phases +- GIVEN a client that starts an Agent in an environment without scheduler admission +- WHEN the Session runs normally +- THEN the client MAY observe the existing `Pending`, `Creating`, and `Running` phases +- AND the client SHALL NOT be required to configure Kueue + +#### Scenario: New Clients See Queued +- GIVEN a client lists or watches Sessions +- WHEN a Session is waiting for admission +- THEN the API, gRPC watch stream, SDKs, CLI, and frontend SHALL expose `Queued` exactly as the stored Session phase + +#### Scenario: Phase Casing +- GIVEN a Session phase appears in an API response, gRPC message, SDK type, CLI output, frontend view, or example +- WHEN the phase is one of the standard phases +- THEN the value SHALL use title-case spelling: `Pending`, `Creating`, `Queued`, `Running`, `Stopping`, `Stopped`, `Completed`, or `Failed` + +#### Scenario: Unknown Phase Rejected +- GIVEN a status update request sets `phase` +- WHEN the value is not one of the supported phases, including `Queued` +- THEN the API SHALL reject the request with a validation error + +### Requirement: Runner Admission Pod +The system SHALL use one plain Kubernetes Pod as the v1 runner admission workload for each Session. The control plane MAY create supporting resources such as Services, Secrets, ServiceAccounts, RoleBindings, and namespaces, but admission queueing applies to the runner Pod. + +#### Scenario: Runner Pod Is The Admission Unit +- GIVEN the control plane provisions a Session +- WHEN it creates the runtime Pod +- THEN it SHALL create exactly one runner admission Pod for that Session +- AND any Kueue admission labels SHALL be placed on that Pod + +#### Scenario: Kueue Queue Label +- GIVEN the resolved admission profile has `kueue_enabled: true` +- WHEN the control plane creates the runner admission Pod +- THEN the Pod metadata labels SHALL include `kueue.x-k8s.io/queue-name` with value equal to the Session's resolved `admission_queue` +- AND the `admission_queue` value SHALL be the LocalQueue name resolved from the admission profile + +#### Scenario: Kueue Queue Label Omitted +- GIVEN the resolved admission profile has `kueue_enabled: false` +- WHEN the control plane creates the runner admission Pod +- THEN the Pod SHALL NOT include the `kueue.x-k8s.io/queue-name` label + +#### Scenario: No Raw Workload API For Tenants +- GIVEN a user or Application manifest declares desired Ambient state +- WHEN it includes raw Kueue `Workload` resources +- THEN Ambient SHALL treat those resources as unsupported scheduler infrastructure +- AND the Session admission contract SHALL remain expressed through Project and Session fields + +#### Scenario: Jobs Are Out Of Scope +- GIVEN the platform later wants runner admission to use Kubernetes Jobs or another controller +- WHEN that change alters the admission unit, retry behavior, or preemption behavior +- THEN this spec SHALL be amended before implementation + +### Requirement: Project Session Admission Policy +The system SHALL extend Project create, patch, read, and declarative apply surfaces with an optional `session_admission` policy that declares project-level intent for runner admission. + +`session_admission` SHALL contain Ambient-owned fields, not raw scheduler object definitions: + +```yaml +session_admission: + profile: standard +``` + +Omitting `session_admission` or omitting `profile` SHALL inherit the platform default admission profile. + +The stored value `null`, an empty object, or a `profile` value of `null` SHALL mean "inherit the platform default." In declarative apply, an omitted `session_admission` field SHALL leave the live field unmanaged and unchanged; `session_admission: null` or `session_admission: {}` SHALL clear the live value back to default inheritance. + +#### Scenario: Project Uses Default Profile +- GIVEN a Project manifest does not set `session_admission` +- WHEN the Project is created or applied +- THEN the Project SHALL use the platform default admission profile +- AND existing Project manifests SHALL continue to apply without changes + +#### Scenario: Project Selects Admission Profile +- GIVEN a Project manifest sets `session_admission.profile` +- WHEN the Project is created, patched, or applied +- THEN the API SHALL validate that the profile exists and is allowed for the caller +- AND new Sessions in that Project SHALL use the selected profile unless a more specific allowed override is introduced by another spec + +#### Scenario: Declarative Apply Clears Admission Profile +- GIVEN an existing Project has a stored `session_admission.profile` +- WHEN `acpctl apply` or Application sync applies a Project manifest with `session_admission: null` or `session_admission: {}` +- THEN the Project SHALL clear the stored policy +- AND future Sessions SHALL inherit the platform default profile + +#### Scenario: Invalid Admission Profile +- GIVEN a Project update requests an unknown `session_admission.profile` +- WHEN the API validates the update +- THEN the API SHALL reject the update +- AND no scheduler resources SHALL be changed for that Project + +### Requirement: Admission Profile Catalog +The system SHALL have a platform-owned admission profile catalog used consistently by the API server, control plane, CLI, SDKs, and Application sync validation. + +Each profile SHALL define: + +```yaml +name: standard +default: true +tenant_selectable: true +kueue_enabled: true +cluster_queue: ambient-standard +local_queue: ambient-sessions-standard +runner_start_timeout: 10m +resource_limits: + cpu: "2" + memory: 4Gi +``` + +Exactly one profile SHALL be marked as the platform default. Profile names SHALL be stable API values suitable for Project manifests. + +#### Scenario: Default Profile Resolution +- GIVEN a Project inherits the default profile +- WHEN a Session starts in that Project +- THEN the API and control plane SHALL resolve the same default profile from the platform-owned catalog + +#### Scenario: Profile Catalog Changes +- GIVEN the platform profile catalog changes +- WHEN new Sessions are started +- THEN new Sessions SHALL use the current profile mapping +- AND already-created Sessions SHALL retain their resolved profile and queue snapshot + +#### Scenario: Unknown Profile Rejected +- GIVEN a Project create, patch, declarative apply, or Application sync requests a profile not present in the catalog +- WHEN the API validates the request +- THEN the request SHALL be rejected before scheduler resources are changed + +### Requirement: Application Sync Boundary +The system SHALL allow Applications to sync Project `session_admission` policy as part of Project resources, and SHALL NOT allow Applications to sync raw scheduler infrastructure such as Kueue `ClusterQueue`, `LocalQueue`, `ResourceFlavor`, or `Workload` resources. + +#### Scenario: Application Syncs Project Admission Intent +- GIVEN an Application renders a Project manifest with `session_admission.profile` +- WHEN the Application syncs +- THEN the Project SHALL be created or patched with that admission policy only if the effective sync actor is authorized for that Project and profile +- AND Application diff, sync status, and resource status SHALL include drift in `session_admission` + +#### Scenario: Local Application Does Not Bypass Admission Authorization +- GIVEN an Application targets the local Ambient instance +- WHEN the Application sync controller applies `session_admission` +- THEN authorization SHALL be evaluated against the Application's effective sync actor, not the controller's internal service bypass +- AND the sync SHALL fail for any profile the effective sync actor could not select through the normal Project API + +#### Scenario: Application Renders Raw Kueue Resource +- GIVEN an Application renders a Kueue `ClusterQueue`, `LocalQueue`, `ResourceFlavor`, or `Workload` +- WHEN the Application syncs +- THEN the sync engine SHALL skip that document as unsupported infrastructure +- AND the Application `resource_status` SHALL record the skipped resource +- AND the sync operation SHALL continue for supported Ambient resources + +#### Scenario: Remote Application Uses Same Contract +- GIVEN an Application targets a remote Ambient instance +- WHEN it syncs Project `session_admission` +- THEN authorization and validation SHALL be enforced by the destination Ambient API +- AND the source instance SHALL NOT infer or create scheduler resources on behalf of the destination instance + +### Requirement: Kueue Admission Mode +The control plane SHALL support an optional Kueue-backed session admission mode that maps Project admission profiles to platform-managed Kueue queues. + +#### Scenario: Kueue Disabled +- GIVEN Kueue admission mode is disabled +- WHEN the control plane provisions a Session +- THEN it SHALL NOT require Kueue APIs or labels +- AND it SHALL continue to provision runner Pods through the default Kubernetes scheduling path +- AND the Session SHALL remain `Queued` until the runner Pod is scheduled, ready, and reachable + +#### Scenario: Kueue Enabled +- GIVEN Kueue admission mode is enabled +- AND a Project resolves to an admission profile +- WHEN a Session runner Pod is created +- THEN the Pod SHALL target the platform-managed LocalQueue for that Project and profile +- AND the Session SHALL remain `Queued` until the Pod is admitted, scheduled, ready, and the runner is reachable + +#### Scenario: Runner Reachability +- GIVEN a runner admission Pod exists for a Session +- WHEN the Pod has Kubernetes `Ready=True` +- AND an HTTP `GET /health` probe to the runner on the configured runner service or Pod endpoint returns a 2xx response +- THEN the control plane SHALL treat the runner as reachable +- AND the Session MAY transition from `Queued` to `Running` + +#### Scenario: Runner Reachability Timeout +- GIVEN a runner admission Pod exists for a Session +- WHEN the Pod does not become scheduled, ready, and reachable before the resolved profile's `runner_start_timeout` +- THEN the Session SHALL become `Failed` +- AND the Session SHALL include a condition with type `RunnerReachable` and a timeout reason + +#### Scenario: Admission Infrastructure Missing +- GIVEN Kueue admission mode is enabled +- AND the Project's resolved queue cannot be found or created +- WHEN the control plane reconciles a pending Session +- THEN the Session SHALL become `Failed` +- AND the Session SHALL include a condition that identifies admission configuration as the failure category + +### Requirement: Platform-Owned Queue Materialization +The system SHALL treat Kueue queue resources as platform-owned scheduler infrastructure materialized from platform configuration and Project admission policy. + +#### Scenario: Project Namespace Has Local Queue +- GIVEN Kueue admission mode is enabled +- AND a managed Project namespace exists +- WHEN the Project resolves to an admission profile +- THEN the control plane SHALL ensure the namespace has a LocalQueue named by the resolved profile's `local_queue` +- AND the LocalQueue SHALL point to the platform-configured ClusterQueue for that profile +- AND the LocalQueue SHALL carry Ambient managed labels for project, profile, and control-plane ownership + +#### Scenario: Local Queue Reconciles Drift +- GIVEN a managed LocalQueue exists in a Project namespace +- WHEN its ClusterQueue reference or managed labels drift from the resolved admission profile +- THEN the control plane SHALL update the LocalQueue rather than create a duplicate + +#### Scenario: Local Queue Deletion +- GIVEN a Project is deleted +- WHEN no non-terminal Sessions reference a managed LocalQueue for that Project +- THEN the control plane SHALL delete that managed LocalQueue + +#### Scenario: Admin Owns Cluster Queues +- GIVEN a platform operator configures Kueue `ClusterQueue` and `ResourceFlavor` resources +- WHEN Project owners change Project manifests +- THEN Project owners SHALL NOT be able to create, update, or delete those cluster-scoped scheduler resources through Project or Application APIs + +#### Scenario: Project Profile Changes +- GIVEN a Project's `session_admission.profile` changes +- WHEN the control plane reconciles Project scheduler infrastructure +- THEN new Sessions SHALL use the new profile +- AND already-created Session runner Pods SHALL retain their originally resolved queue unless explicitly restarted + +### Requirement: Resource Accounting Inputs +The control plane SHALL provide explicit resource requests for every container in a runner admission Pod, and SHALL validate any user-configurable resource overrides before they affect admission. + +#### Scenario: Default Requests Applied +- GIVEN a Session has no resource overrides +- WHEN the control plane creates the runner Pod +- THEN every Pod container SHALL include platform default CPU and memory requests sufficient for scheduler accounting +- AND the Pod SHALL be eligible for pod-count quota accounting when the scheduler profile uses pod quotas + +#### Scenario: Agent Resource Overrides Applied +- GIVEN an Agent defines valid `resource_overrides` +- WHEN that Agent is started +- THEN the copied Session overrides MAY affect runner Pod resource requests +- AND the effective request SHALL remain within the Project's allowed admission profile + +#### Scenario: Invalid Override Rejected +- GIVEN a Session would use malformed or disallowed `resource_overrides` +- WHEN the API or control plane validates the Session +- THEN the Session SHALL fail before admission +- AND the failure SHALL be visible as a validation or status condition rather than silently falling back to different resources + +### Requirement: Admission Status Observability +The system SHALL expose queue/admission state through Session phase and conditions without requiring users to read Kubernetes objects. + +Sessions SHALL snapshot resolved admission state in API fields: + +```yaml +admission_profile: standard +admission_queue: ambient-sessions-standard +``` + +Session `conditions` SHALL be a JSON array of condition objects with `type`, `status`, `reason`, `message`, and `last_transition_time`. Standard admission-related condition types SHALL include `AdmissionProfileResolved`, `AdmissionQueued`, `AdmissionConfiguration`, `PodScheduled`, `RunnerReachable`, and `Preempted`. + +#### Scenario: Queue Wait Is Visible +- GIVEN a Session is `Queued` +- WHEN a user reads, lists, watches, or describes the Session +- THEN the response SHALL include the queue/admission phase +- AND SHALL include the resolved admission profile and queue identifier when resolved +- AND SHOULD include the latest wait reason when available + +#### Scenario: Scheduler Objects Are For Operators +- GIVEN a platform operator has Kubernetes access +- WHEN they inspect Kueue objects directly +- THEN Kueue `Workload`, `LocalQueue`, and `ClusterQueue` status MAY be used for operational diagnosis +- AND that Kubernetes access SHALL NOT be required for normal Ambient user workflows + +### Requirement: Preemption And Replacement +The control plane SHALL handle scheduler preemption explicitly and SHALL NOT silently lose a running Session's work. + +#### Scenario: Queued Pod Preempted Before Runner Start +- GIVEN a Session is `Queued` +- WHEN its runner Pod is preempted or deleted before the runner starts +- AND the Session is not `Stopping` +- THEN the control plane SHALL recreate the runner Pod for the same resolved profile and queue +- AND the Session SHALL remain `Queued` with a condition recording the preemption + +#### Scenario: Running Pod Preempted +- GIVEN a Session is `Running` +- WHEN its runner Pod is preempted or deleted by the scheduler or disappears without a user stop request +- THEN the Session SHALL become `Failed` unless durable resume for preempted sessions is explicitly supported +- AND the Session SHALL include a `Preempted` condition + +### Requirement: RBAC +The system SHALL enforce Project and Application RBAC on admission policy changes and SHALL keep scheduler infrastructure permissions separate from tenant permissions. + +#### Scenario: Project Owner Updates Admission Profile +- GIVEN a user has permission to update a Project +- AND the requested admission profile is tenant-selectable +- WHEN the user patches `session_admission.profile` +- THEN the API SHALL allow the update + +#### Scenario: Privileged Profile Requires Privilege +- GIVEN an admission profile is not tenant-selectable +- WHEN a non-privileged Project user attempts to select that profile +- THEN the API SHALL reject the update + +#### Scenario: Platform Admin Selects Privileged Profile +- GIVEN an admission profile is not tenant-selectable +- AND the caller has platform-admin authority through `*:*` +- WHEN the caller creates, patches, applies, or syncs a Project selecting that profile +- THEN the API SHALL allow the update + +#### Scenario: Bootstrap Project Create Uses Default Admission +- GIVEN Project creation has no established Project-specific role binding yet +- WHEN the request has no platform-admin authority +- THEN the request SHALL omit `session_admission` or select only the platform default profile +- AND non-default profile selection SHALL be rejected at create time + +#### Scenario: Application Sync Actor Applies Policy +- GIVEN an Application effective sync actor lacks permission to update the destination Project or select the requested profile +- WHEN the Application syncs a Project manifest changing `session_admission` +- THEN the sync SHALL fail for that Project resource +- AND the Application SHALL report the authorization failure in resource status + +#### Scenario: Local Application Effective Sync Actor +- GIVEN an Application targets the local Ambient instance +- WHEN the Application is created +- THEN the API SHALL record the authenticated creator as the default effective sync actor +- AND later local syncs SHALL authorize `session_admission` changes as that actor unless the Application explicitly stores another authorized sync actor + +#### Scenario: Control Plane Kueue Permissions +- GIVEN Kueue admission mode is enabled +- WHEN the control plane reconciles Project queues and Session runner Pods +- THEN the control plane SHALL have Kubernetes RBAC to get, list, watch, create, patch, update, and delete namespaced Kueue LocalQueues +- AND SHALL have Kubernetes RBAC to get, list, and watch Kueue Workloads for observed Session Pods +- AND tenant users SHALL NOT receive Kubernetes RBAC to manage Kueue ClusterQueues, ResourceFlavors, LocalQueues, or Workloads through Ambient Project or Application permissions + +### Requirement: Consumer Migration +The system SHALL migrate all existing consumers of Session phase and Project declarative manifests to the new admission contract. + +#### Scenario: Generated API Clients +- GIVEN OpenAPI, gRPC, Go SDK, Python SDK, TypeScript SDK, CLI, and frontend clients represent Project or Session data +- WHEN `session_admission` and `Queued` are added to the API contract +- THEN generated and hand-written clients SHALL be updated together +- AND Project create, patch, read, CLI create/update/apply, and Application sync clients SHALL carry `session_admission` +- AND Session read, list, watch, SDK, CLI, and frontend clients SHALL carry `admission_profile`, `admission_queue`, and structured `conditions` +- AND Session phase parsers, active-session checks, stop/delete gates, polling, action availability, and display helpers SHALL treat `Queued` as an active non-terminal phase + +#### Scenario: Active Session Queries +- GIVEN code checks whether an Agent has an active Session +- WHEN the existing Session is `Queued` +- THEN the check SHALL treat it as active + +#### Scenario: Declarative Apply +- GIVEN `acpctl apply` processes a Project manifest with `session_admission` +- WHEN the Project exists +- THEN apply SHALL patch drift in `session_admission` +- AND unchanged policies SHALL report `unchanged` + +#### Scenario: Inbox Drain Timing +- GIVEN an Agent start request creates a Session that later waits in `Queued` +- WHEN unread Inbox messages existed at Agent start time +- THEN those messages SHALL be drained into that Session's start context at Agent start time +- AND Inbox messages created after Agent start while the Session is `Queued` SHALL remain unread for a future Agent start + +#### Scenario: Existing Database Rows +- GIVEN existing Project rows have no admission policy +- WHEN the migration is applied +- THEN those Projects SHALL behave as if they use the platform default profile +- AND no existing Session row SHALL require phase rewriting +- AND existing Session rows SHALL receive null `admission_profile` and `admission_queue` values until reconciled +- AND existing empty Session `conditions` values SHALL migrate to an empty JSON array +- AND existing non-empty Session `conditions` values SHALL be preserved if they are valid JSON arrays, or wrapped in a `LegacyCondition` entry if they are not valid condition arrays