Skip to content

Add OpenStackAssistant CRD for AI-powered cluster health checks and upgrades#1914

Draft
dprince wants to merge 7 commits intoopenstack-k8s-operators:mainfrom
dprince:assistant
Draft

Add OpenStackAssistant CRD for AI-powered cluster health checks and upgrades#1914
dprince wants to merge 7 commits intoopenstack-k8s-operators:mainfrom
dprince:assistant

Conversation

@dprince
Copy link
Copy Markdown
Contributor

@dprince dprince commented May 5, 2026

Add OpenStackAssistant CRD for AI-powered cluster management

Introduce a new OpenStackAssistant custom resource that deploys an AI
agent (Goose) as a Kubernetes pod with read only access to the OpenStack
control plane. The assistant connects to a Lightspeed Stack AI backend
and is configured with operator credentials, recipes, and hints for
cluster diagnostics and management tasks.

Key components:

  • assistant.openstack.org/v1beta1 API with OpenStackAssistant CRD
  • Controller that reconciles pods with openstackclient capabilities,
    Goose configuration, CA bundles, and provider secrets
  • Validating webhook for spec validation
  • Support for configurable model, recipes (slash commands), and hints
  • Unit tests for controller and helper functions
  • CRD bindata, RBAC roles, and sample manifests

@openshift-ci openshift-ci Bot requested review from abays and fultonj May 5, 2026 12:03
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprince

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

OpenStackControlPlane CRD Size Report

Metric Value
CRD JSON size 322613 bytes (315KB)
Base branch size 322464 bytes
Change +0.05%
Status yellow — growing
Threshold reference
Color Range Meaning
🟢 green < 300KB Comfortable
🟡 yellow 300–400KB Growing
🟠 orange 400–750KB Concerning
🔴 red > 750KB Approaching 1.5MB etcd limit (cut in half to allow space for update)

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/8043f111931d4762b34dec5062d331be

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 05m 10s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 27m 31s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 38m 25s
adoption-standalone-to-crc-ceph-provider RETRY_LIMIT in 27s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 52m 54s
openstack-operator-edpm-baremetal-minor-update RETRY_LIMIT in 22m 57s

dprince added 5 commits May 6, 2026 09:37
Implements the OpenStackAssistant API (assistant.openstack.org/v1beta1)
which deploys a managed Goose AI agent pod with read-only RBAC for
cluster diagnostics via Lightspeed Stack.
Add a dedicated Model field to GooseConfig so the Goose AI model can be
set declaratively in the OpenStackAssistant CR spec rather than requiring
it to be passed as a raw env var. When set, the controller injects the
GOOSE_MODEL environment variable into the pod.

Update the entrypoint script to use $HOME/.config/goose/ instead of
~/.goose/ for Goose configuration paths, aligning with the XDG base
directory convention used by newer Goose versions.
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/6afe669e29ff472fbe553b29737c4f54

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 24m 25s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 27m 05s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 30m 16s
adoption-standalone-to-crc-ceph-provider POST_FAILURE in 3h 10m 26s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 42m 44s
openstack-operator-edpm-baremetal-minor-update FAILURE in 29m 27s

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/4020cbd601b540caae1281f39eec52ab

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 26m 30s
podified-multinode-edpm-deployment-crc RETRY_LIMIT Ansible setup timeout in 1m 19s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 38m 25s
adoption-standalone-to-crc-ceph-provider POST_FAILURE in 3h 12m 43s
openstack-operator-tempest-multinode RETRY_LIMIT Ansible setup timeout in 1m 21s
✔️ openstack-operator-edpm-baremetal-minor-update SUCCESS in 1h 58m 48s

Adds the ability to run the rhos-mcps MCP server as a sidecar container
inside the openstackclient pod, exposed via a k8s Service. This allows
the OpenStackAssistant (Goose) to execute read-only OpenStack CLI
commands through the MCP protocol.

OpenStackClient changes:
- New MCPConfig struct (enabled, containerImage) on the CR spec
- When enabled, adds an mcp-server sidecar container sharing the same
  clouds.yaml/secure.yaml credential mounts
- Controller creates a ConfigMap with rhos-mcps config (openstack
  enabled, openshift disabled, allow_write: false)
- Controller creates a Service on port 8080 for the MCP endpoint

OpenStackAssistant changes:
- New MCPServerRef (name, url) and mcpServers field on GooseConfig
- Each MCP server is passed as MCP_SERVER_<name> env var to the pod
- Entrypoint script generates Goose streamable_http extension entries
  from these env vars

Users can create a second OpenStackClient instance with reader-only
credentials for credential-level read-only enforcement in addition to
the rhos-mcps allow_write:false guardrail.
Enable TLS on the MCP sidecar service using cert-manager with the
internal CA issuer. When CaBundleSecretName is set (indicating TLS is
active on the control plane), the controller provisions a TLS certificate
for the MCP service DNS names via certmanager.EnsureCert(), mounts the
cert secret into the MCP sidecar container at /etc/pki/tls/mcp/, and
switches the service port from 8080 to 8443. The rhos-mcps config is
updated to include TLS cert/key paths and use https for allowed origins.

Changes:
- internal/openstackclient/funcs.go: Add mcpTLSSecretName param to
  ClientPodSpec() for TLS secret volume mount; add tlsEnabled param to
  MCPConfigYAML() for TLS cert/key config and port selection
- config/rbac/role.yaml, bindata/: Regenerated via make manifests and
  make bindata
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/d4496519660c4682a87c645bbdeedd38

openstack-k8s-operators-content-provider FAILURE in 6m 42s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant