Skip to content

Use shared singleton vhost/user CRs with per-consumer finalizers#577

Open
lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:shared-singleton-vhost-user
Open

Use shared singleton vhost/user CRs with per-consumer finalizers#577
lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:shared-singleton-vhost-user

Conversation

@lmiccini
Copy link
Copy Markdown
Contributor

@lmiccini lmiccini commented May 5, 2026

Multiple TransportURL CRs pointing to the same RabbitMQ vhost/user
(e.g., cinder and heat both using vhost "openstack") would each
create separate RabbitMQVhost/RabbitMQUser CRs. Deleting any one
of them destroyed the RabbitMQ resource for all other consumers.

This replaces per-TransportURL CRs with shared singletons named
deterministically via CanonicalVhostName/CanonicalUserName. Each
consumer adds a per-TransportURL finalizer (turl.openstack.org/t-)
so the RabbitMQ resource is only deleted when all consumers are gone.
The RabbitMQ cluster CR is set as owner for automatic GC on cluster
deletion.

Shared CR lifecycle:

  • Orphaned user CRs are labeled (rabbitmq.openstack.org/orphaned)
    instead of deleted, keeping them reclaimable by new consumers.
    Auto-deletion only proceeds after admin removes the
    cleanup-blocked finalizer.
  • Vhost CRs show "Waiting for external finalizers" status when
    deletion is blocked by user finalizer cleanup.
  • User controller cascades vhost CR deletion after its own cleanup.

Migration from legacy per-TransportURL CRs:

  • Legacy CRs (with TransportURL owner refs) are cleaned up after
    canonical CRs are created.
  • Credentials are pre-copied to the canonical secret to avoid
    password regeneration that would break existing connections.
  • Webhook skips legacy CRs and deleting CRs in uniqueness checks.

Webhook hardening:

  • Reject permission changes on shared users with active consumers.
  • Enforce RabbitmqClusterName immutability on user, vhost, and policy.
  • Validate RabbitMQPolicy Pattern as valid regex at admission.
  • Remove duplicate +kubebuilder:webhook markers from apis/ package.
  • Propagate admission context and add timeouts for k8s API calls.

API client hardening:

  • Add context.Context to all RabbitMQ API client methods.
  • Bound error response reads to 1MB and drain bodies on success.
  • Track actual vhost/policy name in RabbitMQPolicyStatus so
    reconcileDelete targets the correct resource.

Controller fixes:

  • Move ObservedGeneration to end of reconcileNormal (user, vhost,
    policy) so it is only set after all work succeeds.
  • Propagate PatchInstance errors via named return instead of
    swallowing them.
  • Guard per-consumer finalizer names against the Kubernetes
    63-char limit with SHA-256 hash truncation.
  • Add periodic 5-minute requeue for long TransportURL names
    where watch-based reverse mapping fails.

Backup/restore:

  • Add backup.openstack.org labels to auto-generated credential
    secrets so Velero includes them.
  • Set restore-order to 10 so the secret is restored before the
    RabbitMQUser CR (order 40).

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Depends-On: openstack-k8s-operators/openstack-operator#1917

@openshift-ci openshift-ci Bot requested review from dprince and stuggi May 5, 2026 09:42
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lmiccini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved label May 5, 2026
@openshift-ci openshift-ci Bot added the approved label May 5, 2026
@lmiccini lmiccini force-pushed the shared-singleton-vhost-user branch from 41b15fa to 36fb717 Compare May 7, 2026 06:46
Multiple TransportURL CRs pointing to the same RabbitMQ vhost/user
(e.g., cinder and heat both using vhost "openstack") would each
create separate RabbitMQVhost/RabbitMQUser CRs. Deleting any one
of them destroyed the RabbitMQ resource for all other consumers.

This replaces per-TransportURL CRs with shared singletons named
deterministically via CanonicalVhostName/CanonicalUserName. Each
consumer adds a per-TransportURL finalizer (turl.openstack.org/t-<name>)
so the RabbitMQ resource is only deleted when all consumers are gone.
The RabbitMQ cluster CR is set as owner for automatic GC on cluster
deletion.

Shared CR lifecycle:
- Orphaned user CRs are labeled (rabbitmq.openstack.org/orphaned)
  instead of deleted, keeping them reclaimable by new consumers.
  Auto-deletion only proceeds after admin removes the
  cleanup-blocked finalizer.
- Vhost CRs show "Waiting for external finalizers" status when
  deletion is blocked by user finalizer cleanup.
- User controller cascades vhost CR deletion after its own cleanup.

Migration from legacy per-TransportURL CRs:
- Legacy CRs (with TransportURL owner refs) are cleaned up after
  canonical CRs are created.
- Credentials are pre-copied to the canonical secret to avoid
  password regeneration that would break existing connections.
- Webhook skips legacy CRs and deleting CRs in uniqueness checks.

Webhook hardening:
- Reject permission changes on shared users with active consumers.
- Enforce RabbitmqClusterName immutability on user, vhost, and policy.
- Validate RabbitMQPolicy Pattern as valid regex at admission.
- Remove duplicate +kubebuilder:webhook markers from apis/ package.
- Propagate admission context and add timeouts for k8s API calls.

API client hardening:
- Add context.Context to all RabbitMQ API client methods.
- Bound error response reads to 1MB and drain bodies on success.
- Track actual vhost/policy name in RabbitMQPolicyStatus so
  reconcileDelete targets the correct resource.

Controller fixes:
- Move ObservedGeneration to end of reconcileNormal (user, vhost,
  policy) so it is only set after all work succeeds.
- Propagate PatchInstance errors via named return instead of
  swallowing them.
- Guard per-consumer finalizer names against the Kubernetes
  63-char limit with SHA-256 hash truncation.
- Add periodic 5-minute requeue for long TransportURL names
  where watch-based reverse mapping fails.

Backup/restore:
- Add backup.openstack.org labels to auto-generated credential
  secrets so Velero includes them.
- Set restore-order to 10 so the secret is restored before the
  RabbitMQUser CR (order 40).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lmiccini lmiccini force-pushed the shared-singleton-vhost-user branch from 431fc58 to eebffe6 Compare May 8, 2026 05:55
@lmiccini lmiccini changed the title Shared singleton vhost user Use shared singleton vhost/user CRs with per-consumer finalizers May 8, 2026
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/6bb04dac1e48482482250097b177b249

openstack-k8s-operators-content-provider FAILURE in 11m 24s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented May 8, 2026

recheck

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented May 8, 2026

/test infra-operator-build-deploy-kuttl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant