Skip to content

[b/r] Add OpenStackBackupConfig controller and backup/restore labeling#1868

Open
stuggi wants to merge 2 commits intoopenstack-k8s-operators:mainfrom
stuggi:backup_restore_controller
Open

[b/r] Add OpenStackBackupConfig controller and backup/restore labeling#1868
stuggi wants to merge 2 commits intoopenstack-k8s-operators:mainfrom
stuggi:backup_restore_controller

Conversation

@stuggi
Copy link
Copy Markdown
Contributor

@stuggi stuggi commented Mar 31, 2026

  • Add OpenStackBackupConfig CRD and controller that watches CRD instances across operators and labels namespace resources (Secrets, ConfigMaps, NADs) with backup.openstack.org labels for backup/restore integration
  • Wire backup/restore labeling into the ControlPlane controller: CA cert secrets get backup labels via SecretTemplate, internal service cert requests get restore=false (via lib-common default), and ReconcileBackupConfig creates/updates the BackupConfig CR with spec defaults

Commit 1: [b/r] Add OpenStackBackupConfig controller

Introduces the backup.openstack.org/v1beta1 API group with the OpenStackBackupConfig CRD. The controller:

  • Discovers CRD instances by reading backup.openstack.org/restore and backup.openstack.org/restore-order labels from CRD schemas (only on start, not on each reconcile and creates a cache) and applies them to all instances. This allows to have a dynamic approach where new CRDs just require the labels and the controller the rbac perms.
  • Labels Secrets, ConfigMaps, and NADs in the namespace with configurable restore ordering
  • Supports per-resource annotation overrides (backup.openstack.org/restore, backup.openstack.org/restore-order) to customize or exclude individual resources
  • Includes envtest coverage

Commit 2: [b/r] Add backup/restore labels to ControlPlane controller

Integrates backup/restore into the existing ControlPlane reconciliation:

  • ReconcileBackupConfig in internal/openstack/backup.go creates the OpenStackBackupConfig CR with spec defaults via CreateOrPatch
  • CA cert secrets labeled at creation time in ca.go via SecretTemplate
  • Labels custom cert-manager Issuers
  • Internal service cert requests labeled with restore=false via lib-common (regenerated by cert-manager on restore)
  • CRD label additions for backup.openstack.org/restore and backup.openstack.org/restore-order on ControlPlane, Version, DataPlaneNodeSet, and DataPlaneService types

Jira: OSPRH-22912
Jira: OSPRH-22913
Jira: OSPRH-26645

Depends-On: openstack-k8s-operators/lib-common#680
Depends-On: openstack-k8s-operators/lib-common#684
Depends-On: openstack-k8s-operators/lib-common#685

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 31, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

OpenStackControlPlane CRD Size Report

Metric Value
CRD JSON size 322464 bytes (315KB)
Base branch size 322326 bytes
Change +0.04%
Status yellow — growing
Threshold reference
Color Range Meaning
🟢 green < 300KB Comfortable
🟡 yellow 300–400KB Growing
🟠 orange 400–750KB Concerning
🔴 red > 750KB Approaching 1.5MB etcd limit (cut in half to allow space for update)

@stuggi stuggi requested review from abays and dprince and removed request for rabi and rebtoor March 31, 2026 17:01
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/dd57c92b72a04ef0929c08fe0728effe

openstack-k8s-operators-content-provider FAILURE in 9m 19s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/fe53444e267b46bfab002d52d844e719

openstack-k8s-operators-content-provider FAILURE in 7m 42s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@stuggi stuggi force-pushed the backup_restore_controller branch from 2e3227a to ddaf0cb Compare April 7, 2026 13:43
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b224d61474934085ac7a999c8bb3f7a1

openstack-k8s-operators-content-provider FAILURE in 8m 02s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
openstack-operator-docs-preview POST_FAILURE in 2m 37s
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@stuggi stuggi force-pushed the backup_restore_controller branch 2 times, most recently from dccad21 to 3c0c72e Compare April 8, 2026 06:24
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/baec74827fef49899c7ef7d9c71c34a7

openstack-k8s-operators-content-provider FAILURE in 7m 42s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@stuggi stuggi force-pushed the backup_restore_controller branch from 3c0c72e to 378a2cb Compare April 10, 2026 12:49
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d108ae22d2ac4105b454c6997c44c2ad

openstack-k8s-operators-content-provider FAILURE in 7m 49s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@stuggi
Copy link
Copy Markdown
Contributor Author

stuggi commented Apr 12, 2026

/retest

@stuggi stuggi force-pushed the backup_restore_controller branch from 378a2cb to a0ec96e Compare April 13, 2026 05:31
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ed602d0559ba42829787fa1571d0dbc3

openstack-k8s-operators-content-provider FAILURE in 7m 55s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ adoption-standalone-to-crc-ceph-provider SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@stuggi stuggi force-pushed the backup_restore_controller branch from a0ec96e to d101e85 Compare April 16, 2026 05:50
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c733c0866b64471cb1201d4ae9f38243

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 20m 31s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 24m 19s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 31m 09s
adoption-standalone-to-crc-ceph-provider FAILURE in 1h 10m 38s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 39m 07s
✔️ openstack-operator-edpm-baremetal-minor-update SUCCESS in 2h 06m 54s

@stuggi stuggi force-pushed the backup_restore_controller branch 4 times, most recently from 747ffba to 7ae2e1f Compare April 17, 2026 11:57
Comment thread api/backup/v1beta1/groupversion_info.go Outdated
Comment thread internal/openstack/backup.go Outdated
Comment on lines +237 to +238
// Note: watches for CRD instance types are only registered at setup time,
// so CR instance changes won't trigger reconciliation in this case.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we don't need to watch for CRD changes during runtime, because new CRDs would only be added during an update, in which case a new OpenStack operator is created (which re-inits the cache). Is that true?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes correct. with an update we get a new openstack-op image which has the new CRDs and the operator/controllers get restarted with the new version

Add the BackupConfig CRD, API types, controller, RBAC, samples, and
envtests for the backup/restore labeling feature. The controller watches
CRD instances across operators and labels resources (secrets, configmaps,
NADs) with backup.openstack.org labels for backup/restore integration.
Supports annotation overrides on individual resources to customize
restore ordering or exclude from backup.

Custom Issuer labeling is handled by the ControlPlane controller in
ca.go, not by the BackupConfig controller.

Jira: OSPRH-22912
Jira: OSPRH-22913

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
@stuggi stuggi force-pushed the backup_restore_controller branch from 7ae2e1f to d0ed142 Compare April 17, 2026 14:37
Wire the BackupConfig reconciliation into the ControlPlane controller
with proper condition handling (OpenStackControlPlaneBackupConfigReady).
Add backup/restore labels to CA cert secrets via SecretTemplate, and
restore=false labels to internal service cert requests. Add the
ReconcileBackupConfig call, secret watch with annotation change
predicate, and RBAC for openstackbackupconfigs. Set BackupConfig spec
defaults in the CreateOrPatch mutate function.

Label custom Issuers for backup/restore in addIssuerLabelAnnotation
after removeIssuerLabel so the MatchingLabels query only uses CA
selector labels. Remove getCertSecretBackupLabels wrapper, call
backup.GetCertSecretBackupLabels directly. Return error from
GetCertSecretBackupLabels for non-NotFound errors. Rename GetConfig
parameter from gvk to crdName.

Jira: OSPRH-22912
Jira: OSPRH-22913

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
@stuggi stuggi force-pushed the backup_restore_controller branch from d0ed142 to 4aa477c Compare April 17, 2026 14:44
Comment on lines +9 to +10
# Target namespace to watch for resources
targetNamespace: openstack
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Target namespace to watch for resources
targetNamespace: openstack

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this field doesn't exist. I'm guessing you just use the metadata.Namespace instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there only supposed to be one OpenStackBackupConfig per namespace? If so, should we add a validation webhook to block creating one if one already exists? Also if so, maybe in the case that the user creates the OpenStackBackupConfig first, the OpenStackControlPlane webhook should block creating an OpenStackControlPlane that doesn't have the same name as OpenStackBackupConfig (and I guess we would need to take OpenStackVersion into consideration too, since its name also has to be the same)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants