feat(CC-0093): Parameterize OpenBao RemoteKey per Keystone CR#280
feat(CC-0093): Parameterize OpenBao RemoteKey per Keystone CR#280
Conversation
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new fernet and credential RemoteKey integration tests in
integration_test.goare nearly identical; consider extracting a shared helper or table-driven test to reduce duplication and keep the per-CR assertions in one place. - The new
bao kvpolling script inconcurrent-cr-conflicts/chainsaw-test.yamlrepeats a lot of setup and error-handling logic that is very similar to the deletion-cleanup chainsaw step; consider factoring the common BAO_ADDR/BAO_TOKEN/bootstrap and polling pattern into a reusable snippet or at least aligning variable names and structure to make future changes less error-prone. - In
credentialKeysPushSecretandfernetKeysPushSecret,fmt.Sprintf("openstack/keystone/%s/...", keystone.Name)could be replaced with simple string concatenation to match the pattern used in the unit tests and avoid an unnecessary dependency onfmtin these hot paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new fernet and credential RemoteKey integration tests in `integration_test.go` are nearly identical; consider extracting a shared helper or table-driven test to reduce duplication and keep the per-CR assertions in one place.
- The new `bao kv` polling script in `concurrent-cr-conflicts/chainsaw-test.yaml` repeats a lot of setup and error-handling logic that is very similar to the deletion-cleanup chainsaw step; consider factoring the common BAO_ADDR/BAO_TOKEN/bootstrap and polling pattern into a reusable snippet or at least aligning variable names and structure to make future changes less error-prone.
- In `credentialKeysPushSecret` and `fernetKeysPushSecret`, `fmt.Sprintf("openstack/keystone/%s/...", keystone.Name)` could be replaced with simple string concatenation to match the pattern used in the unit tests and avoid an unnecessary dependency on `fmt` in these hot paths.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
WARNING (2)W-001: Migration note overstates operator intervention requirementFile: Problem: The migration note claims the new per-CR path is applied automatically on the next reconcile after upgrade with no operator intervention required. This only holds for fresh deployments via hack/deploy-infra.sh, which runs openbao_bootstrap() → setup-policies.sh. For operators upgrading an existing cluster who do not re-run the bootstrap, the OpenBao ACL policy (deploy/openbao/policies/push-keystone-keys.hcl) is a filesystem artefact applied only via setup-policies.sh and is not reconciled by Flux — there is no Kubernetes object mapping the HCL file into OpenBao at runtime. Until the policy is re-applied, ESO will receive 403 on every push to the new per-CR paths, and FernetKeysReady / CredentialKeysReady will fail silently on the backup step. Code: Action Required: Replace the first paragraph of the migration note to explicitly name the policy re-apply step, or tighten the claim to 'next reconcile after upgrading both the operator image and the OpenBao policy'. Suggested Fix: Related: W-002 W-002: Policy comment inaccurately describes /db subtree exclusionFile: Problem: The comment claims 'db/anything' is not matched by the glob patterns, but this is not literally true: Code: Action Required: Rewrite the comment to accurately describe the invariant: that the Suggested Fix: Related: W-001 INFO (1)I-001: Architecture submodule reference still carries the flat pathFile: Problem: The architecture submodule reference at architecture/docs/09-implementation/04-keystone-reconciler.md:279 still contains the pre-CC-0093 flat KV-v2 path. The commit log (7e8430d) correctly defers this to a separate upstream PR against github.com/C5C3/C5C3 per the repository's NO_SUBMODULE_MODIFICATIONS rule and flags the submodule-pin bump as a follow-up worktree. This is the right call — a submodule content change in this PR would violate repo policy. Code: Action Required: Confirm that a separate follow-up ticket against C5C3/C5C3 exists to update the upstream architecture doc and subsequently bump the submodule pin. Open one if it does not already exist. No change required in this PR. Suggested Fix: SummaryThe PR correctly fixes a concurrency bug by parameterizing the OpenBao RemoteKey with keystone.Name across six surfaces (PushSecret builders, OpenBao HCL policy, unit/envtest/chainsaw tests, and reference docs), with commit staging and scope kept clean. Test coverage is unusually strong: unit tests assert the legacy flat path is explicitly not produced, chainsaw asserts legacy paths remain absent, and the E2E scripts share a consistent bao-polling shape between concurrent-cr-conflicts and deletion-cleanup. Two documentation accuracy issues need fixing before merge — the migration note under-specifies the OpenBao policy re-apply step for brownfield upgrades, and the HCL policy comment mischaracterizes which paths the
Important Recommendation: Merge after fixing WARNING findings 1 and 2; both are documentation-only corrections with drop-in replacements provided. Finding 3 is an acknowledged follow-up against the C5C3/C5C3 submodule and does not block this PR. |
|
Regarding review finding I-001 (architecture submodule reference at Per the repository's I searched the C5C3/C5C3 issue tracker (all states, ≤30 most recent) and did not find an existing ticket that tracks this change. Attempting to open one programmatically from this session returned Requested action: please open a tracking issue against The deferred follow-up is recorded in |
- Scope fernet-keys backup PushSecret RemoteKey to
`openstack/keystone/{keystone.Name}/fernet-keys` so two Keystone CRs in
the same namespace cannot collide on a single OpenBao object
- Scope credential-keys backup PushSecret RemoteKey to
`openstack/keystone/{keystone.Name}/credential-keys` with the same
per-CR path layout
- Switch the `push-keystone-keys` OpenBao ACL from two literal leaf
paths to
`kv-v2/{data,metadata}/openstack/keystone/+/{fernet,credential}-keys`,
keeping `create/update/read/delete` on both data and metadata endpoints
for ESO PushSecret write + DeletionPolicy=Delete purges
- Preserve the existing lockout defence: the policy still enforces the
literal `/fernet-keys` or `/credential-keys` leaf suffix, so the
read-only MariaDB credentials at `kv-v2/data/openstack/keystone/db`
remain unwritable and a compromised ESO controller cannot overwrite them
- Extend envtest coverage in `reconcile_fernet_test.go` and
`reconcile_credential_test.go` to assert the `{name}` path segment and
verify two CRs in one namespace produce disjoint RemoteKeys
- Add integration-test cases that reconcile multiple Keystone CRs side
by side and assert per-CR RemoteKey isolation end to end
- Expand the `concurrent-cr-conflicts` chainsaw e2e to verify isolated
OpenBao paths under real reconciliation and tighten the
`deletion-cleanup` chainsaw to match the new per-CR path shape
- Document the per-CR path layout in
`docs/reference/keystone-reconciler.md`, including a migration note that
flags legacy flat paths as orphaned-but-harmless after upgrade and gives
the canonical `bao kv metadata delete` cleanup commands, plus an
explicit call-out that the updated policy HCL must be re-applied so ESO
retains write access
AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
9efb822 to
1573e02
Compare
GitHub Issue #279: Parameterize OpenBao RemoteKey per Keystone CR to eliminate cross-CR KV path collisions
Category: bug | Scope: Medium
Description: The Fernet and credential
PushSecretbuilders hardcode the OpenBaoRemoteKeyto the same two literal paths for every Keystone CR in a cluster:operators/keystone/internal/controller/reconcile_fernet.go:423—RemoteKey: "openstack/keystone/fernet-keys"operators/keystone/internal/controller/reconcile_credential.go:429—RemoteKey: "openstack/keystone/credential-keys"Every
PushSecretCR —{keystone.Name}-fernet-keys-backup/{keystone.Name}-credential-keys-backup— is itself name-scoped, and the localSecretit selects is name-scoped too (reconcile_fernet.go:417,reconcile_credential.go:423). Only the remote KV-v2 path is flat-namespaced across CRs. Two Keystone CRs in the same Kubernetes namespace therefore share exactly two backing OpenBao objects — whichever CR's ESO push lands last wins, and either CR's deletion can race with the other CR's next push.In production this is dormant (the conventional deployment runs a single Keystone per namespace), but it is load-bearing in the E2E suite, where
tests/e2e/chainsaw-config.yaml:40setsparallel: 4and everytests/e2e/keystone/*/chainsaw-test.yamlfile lives in the sameopenstacknamespace. Step 6 oftests/e2e/keystone/deletion-cleanup/chainsaw-test.yaml:255-256asserts that both KV-v2 paths arenot-foundafter thekeystone-cleanupCR is deleted — an assertion that cannot hold whenever a concurrent test's CR is mid-reconcile, because that CR'sPushSecretis writing back to the shared path the cleanup test just purged.The operator's own OpenBao policy (
deploy/openbao/policies/push-keystone-keys.hcl:14-20) already rejects a wildcard over the Keystone subtree — for the orthogonal reason of protecting the read-onlyopenstack/keystone/dbcredentials consumed by Keystone — so the flat two-path scheme is not even a policy simplification we are preserving.Motivation: Reproduced on
feature/CC-0088in CI run 24903037791/job 72927386373 at2026-04-24T17:53:33Z. Thekeystone-deletion-cleanupstep-6 retry loop polled for 30 s and observed both KV-v2 paths continuously present withcreated_time=2026-04-24T17:53:11.602847833Z(Fernet) — created 10 s after the cleanup test'skubectl deleteat17:53:01and exactly matching thekeystone-brownfield-databasestep-2 apply at17:53:11. The adjacentkeystone-credential-rotationandkeystone-concurrent-cr-conflictsCRs running in parallel contribute additional overlapping writers. Step-4'sPushSecret NotFoundassertion passes cleanly on the cleanup test's own backup objects — the leak is purely the shared remote path being re-created by a neighbour.Previously observed on
mainin CI run 24839732116 at14:31:30(recorded in auto-memoryproject_shared_openbao_kv_path.md), 16 s afterkeystone-cleanupdeletion, same failure mode. Both occurrences match the same timestamp-evidenced race; this is a reliably reproducible flake rather than a one-off. PR #271 (CC-0091) closed the independent adoption-race leak in the operator itself and added the bounded-retry loop step-6 depends on, but the loop cannot tolerate an external, legitimate writer regenerating the path — only this issue can.Spin-off from analysing the red
feature/CC-0088E2E run; CC-0088 itself is unrelated (Envoy Gateway / HTTPRoute assertions).Proposed approach:
Switch both builders to a per-CR remote key layout, e.g.
openstack/keystone/{keystone.Name}/fernet-keysandopenstack/keystone/{keystone.Name}/credential-keys. The CR name is already in scope in both functions (fernetKeysPushSecret(keystone *keystonev1alpha1.Keystone)atreconcile_fernet.go:403, symmetric atreconcile_credential.go:409), so the change is a one-linefmt.Sprintfin each. A nested subpath keeps the convention of the siblingopenstack/keystone/dbpath used bydeploy/eso/externalsecrets/keystone-db.yaml:27and preserves the existing wildcard reads granted bydeploy/openbao/policies/eso-management.hcl:34(kv-v2/data/openstack/keystone/*).Broaden the
push-keystone-keys.hclpolicy from two literal paths to two per-name patterns while still excluding/db:kv-v2/{data,metadata}/openstack/keystone/+/fernet-keyskv-v2/{data,metadata}/openstack/keystone/+/credential-keysThe
+single-segment glob (OpenBao/Vault capability template syntax) keeps the policy tight enough that a compromised ESO still cannot reachopenstack/keystone/db, matching thepush-keystone-keys.hcl:14-20rationale. Verify theeso-managementwildcard (kv-v2/data/openstack/keystone/*) still covers the new read paths so the redundant-readcapability remains redundant rather than load-bearing.Update the step-6 chainsaw script in
tests/e2e/keystone/deletion-cleanup/chainsaw-test.yaml:255-256to target the CR-specific paths (kv-v2/openstack/keystone/keystone-cleanup/fernet-keysetc.), so the retry loop asserts exactly the path the cleanup CR owned and cannot be perturbed by parallel tests.Update both builder unit tests (
reconcile_fernet_test.go:412,reconcile_credential_test.go:416) to the new expectedRemoteKeystring.Update reconciler docs:
docs/reference/keystone-reconciler.md:763-764,:803,:1341,:1376,:1510,:1550, andarchitecture/docs/09-implementation/04-keystone-reconciler.md:279to reference{keystone.Name}in the path.Acceptance Criteria:
fernetKeysPushSecretandcredentialKeysPushSecretemit per-CRRemoteKeyvalues that embedkeystone.Name.push-keystone-keys.hclgrantscreate/update/read/deleteon the per-name patterns only, still denies access toopenstack/keystone/db, and does so under a policy comment that captures the rationale for including the name segment.keystone-aandkeystone-bin the same namespace produce four distinct populated KV-v2 paths (.../keystone-a/{fernet,credential}-keys,.../keystone-b/{fernet,credential}-keys) — verifiable by hand withbao kv list kv-v2/openstack/keystone/or by extendingtests/e2e/keystone/concurrent-cr-conflictswith a positive KV assertion.chainsaw/keystone-deletion-cleanupstep-6 asserts thekeystone-cleanup-scoped paths and passes while any ofkeystone-autoscaling,keystone-credential-rotation,keystone-brownfield-database, orkeystone-concurrent-cr-conflictsrun in parallel (parallel: 4).chainsaw/keystone-fernet-rotationorchainsaw/keystone-credential-rotation— PushSecret still reachesReady=Trueon the new path.gh workflow run e2e.yamltrigger) to confirm the race is closed, matching the validation bar used for Race between operator finalizer Delete and ESO PushSecret adoption leaks Fernet and credential keys in OpenBao #271.{keystone.Name}segment appears in every reference-doc location that previously listed the flat path, including the PushSecret table atdocs/reference/keystone-reconciler.md:763-764and the rotation sections at:1341/:1376/:1510/:1550.docs/reference/keystone-reconciler.md: pre-existing clusters carrying data atkv-v2/openstack/keystone/{fernet,credential}-keyswill see the operator write a new path on next reconcile; the old paths are orphaned but harmless (read-only from Keystone's perspective — the live Secret is driven from the local KubernetesSecret, not the OpenBao backup) and can be hard-deleted manually. Call this out explicitly so upgraders don't interpret the stale path as a leak.Refs: CC-0079 (openbao-finalizer), CC-0083 (PushSecret write policy), CC-0091 (adoption race), #271, #273. Auto-memory:
project_shared_openbao_kv_path.md.Labels: bug
Source: #279