CM-966: Implement centralized TLS profile fetching and application without istio-csr changes#416
CM-966: Implement centralized TLS profile fetching and application without istio-csr changes#416arun717 wants to merge 14 commits into
Conversation
|
@arun717: This pull request references CM-966 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds cluster TLS profile support for cert-manager: new tlsprofile utilities and client TLS builder, a deployment hook that applies TLS CLI args from the APIServer resource, wiring in the deployment controller and manager, RBAC to read APIServer, context-threading updates, and related unit/e2e/test and dependency bumps. ChangesCluster TLS profile integration
Sequence Diagram(s)sequenceDiagram
participant Reconciler as Cert-Manager Reconciler
participant DepCtrl as Generic Deployment Controller
participant Hook as TLS Profile Hook
participant Lister as APIServer Informer Lister
participant TLSProf as TLSProfile Utilities
Reconciler->>DepCtrl: Reconcile deployment
alt infra informers available
DepCtrl->>Hook: Invoke WithClusterTLSProfileFromAPIServer
Hook->>Lister: Lister.Get("cluster")
Lister-->>Hook: APIServer resource
Hook->>TLSProf: EffectiveSpec(TLSSecurityProfile)
TLSProf-->>Hook: TLSProfileSpec
Hook->>TLSProf: CertManagerWebhookTLSArgs / OperandMetricsTLSArgs
TLSProf-->>Hook: TLS CLI args
Hook->>Hook: mergeContainerArgs(existingArgs, extraArgs)
Hook-->>DepCtrl: Updated Deployment (container args)
end
DepCtrl-->>Reconciler: Reconciliation complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/controller/istiocsr/install_istiocsr.go (1)
55-58:⚠️ Potential issue | 🟠 MajorThread the request context through the final update.
The
ctxparameter is accepted at line 12, but the finalUpdateWithRetryat line 56 still usesr.ctx. This drops the controller-runtime deadline and cancellation for the last write, negating the benefit of accepting a request-scoped context.Proposed fix
- if err := r.UpdateWithRetry(r.ctx, istiocsr); err != nil { + if err := r.UpdateWithRetry(ctx, istiocsr); err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/istiocsr/install_istiocsr.go` around lines 55 - 58, The final UpdateWithRetry call is using the controller's background context (r.ctx) instead of the request-scoped ctx parameter, losing deadline/cancellation; change the call to use the passed-in ctx (i.e., call r.UpdateWithRetry(ctx, istiocsr) when addProcessedAnnotation(istiocsr) triggers) so the update honors the request context while keeping the same error wrapping and message.test/e2e/utils_test.go (1)
1567-1574:⚠️ Potential issue | 🟡 MinorPropagate the context into the HTTP request too.
Right now
ctxis only used for the APIServer lookup. The actual GET can keep running past the polling context, which makes cancellation less predictable.🔧 Proposed fix
- req, err := http.NewRequest("GET", url, nil) + req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/utils_test.go` around lines 1567 - 1574, The HTTP request needs to carry the test context so cancellation propagates: replace http.NewRequest("GET", url, nil) with creating the request with ctx (use http.NewRequestWithContext or attach ctx via req.WithContext(ctx)) before calling client.Do(req) so the polling context used for APIServer lookup also cancels the actual GET; ensure the same error handling around request creation remains and pass that context-aware req into client.Do.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tlsprofile/tlsprofile.go`:
- Around line 38-41: The current default branch silently returns the
Intermediate profile (using
configv1.TLSProfiles[configv1.TLSProfileIntermediateType]) for unknown
TLSSecurityProfile.Type values, which can mask misconfiguration; instead, in the
function handling TLSSecurityProfile.Type (referencing TLSSecurityProfile.Type,
configv1.TLSProfiles and configv1.TLSProfileIntermediateType), validate the
incoming Type and return a clear error (or an explicit validation error) when
the Type is unrecognized rather than falling back silently—include the unknown
Type value in the error message so callers can detect and handle
invalid/misconfigured TLS profile types.
---
Outside diff comments:
In `@pkg/controller/istiocsr/install_istiocsr.go`:
- Around line 55-58: The final UpdateWithRetry call is using the controller's
background context (r.ctx) instead of the request-scoped ctx parameter, losing
deadline/cancellation; change the call to use the passed-in ctx (i.e., call
r.UpdateWithRetry(ctx, istiocsr) when addProcessedAnnotation(istiocsr) triggers)
so the update honors the request context while keeping the same error wrapping
and message.
In `@test/e2e/utils_test.go`:
- Around line 1567-1574: The HTTP request needs to carry the test context so
cancellation propagates: replace http.NewRequest("GET", url, nil) with creating
the request with ctx (use http.NewRequestWithContext or attach ctx via
req.WithContext(ctx)) before calling client.Do(req) so the polling context used
for APIServer lookup also cancels the actual GET; ensure the same error handling
around request creation remains and pass that context-aware req into client.Do.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e340c8bc-5fba-4d0d-8a12-df126917250b
⛔ Files ignored due to path filters (1)
test/go.sumis excluded by!**/*.sum
📒 Files selected for processing (22)
api/operator/v1alpha1/certmanager_types.gobundle/manifests/cert-manager-operator.clusterserviceversion.yamlbundle/manifests/operator.openshift.io_certmanagers.yamlconfig/crd/bases/operator.openshift.io_certmanagers.yamlconfig/rbac/role.yamlpkg/controller/certmanager/certmanager_controller.gopkg/controller/certmanager/generic_deployment_controller.gopkg/controller/deployment/tls_helpers.gopkg/controller/deployment/tls_profile_hook.gopkg/controller/deployment/tls_profile_hook_test.gopkg/controller/istiocsr/controller.gopkg/controller/istiocsr/controller_test.gopkg/controller/istiocsr/install_instiocsr_test.gopkg/controller/istiocsr/install_istiocsr.gopkg/operator/setup_manager.gopkg/tlsprofile/client_config.gopkg/tlsprofile/client_config_test.gopkg/tlsprofile/tlsprofile.gopkg/tlsprofile/tlsprofile_test.gotest/e2e/issuer_selfsigned_ca_test.gotest/e2e/utils_test.gotest/go.mod
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/tlsprofile/tlsprofile.go`:
- Around line 17-31: The built-in-profile branches currently return shallow
copies (spec := *configv1.TLSProfiles[...]) that still share the Ciphers backing
array; change those branches in EffectiveSpec so you deep-copy the spec before
returning — e.g., copy the struct into a new variable and replace spec.Ciphers
with a newly allocated slice containing copied values (same approach used in the
custom branch). Apply this to the branches referencing
configv1.TLSProfileOldType, configv1.TLSProfileIntermediateType, and
configv1.TLSProfileModernType so callers can safely mutate spec.Ciphers without
altering the global configv1.TLSProfiles.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1e16aa2d-ea24-4a65-b766-ee05e57c82c9
📒 Files selected for processing (2)
pkg/tlsprofile/tlsprofile.gopkg/tlsprofile/tlsprofile_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/tlsprofile/tlsprofile_test.go (1)
80-97: ⚡ Quick winAdd a direct unit test for
CertManagerOperandMetricsTLSArgs.This file validates webhook TLS args, but the separate operand-metrics helper is still untested. A small test here would prevent drift between the two flag builders.
Suggested test addition
+func TestCertManagerOperandMetricsTLSArgs_joinsCiphers(t *testing.T) { + spec := &configv1.TLSProfileSpec{ + Ciphers: []string{"ECDHE-RSA-AES128-GCM-SHA256"}, + MinTLSVersion: configv1.VersionTLS12, + } + args := CertManagerOperandMetricsTLSArgs(spec) + argMap := map[string]string{} + for _, a := range args { + parts := strings.SplitN(a, "=", 2) + if len(parts) != 2 { + t.Fatalf("bad arg %q", a) + } + argMap[parts[0]] = parts[1] + } + if argMap["--metrics-tls-min-version"] != string(configv1.VersionTLS12) { + t.Fatalf("unexpected metrics min version: %q", argMap["--metrics-tls-min-version"]) + } + if !strings.Contains(argMap["--metrics-tls-cipher-suites"], "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256") { + t.Fatalf("unexpected metrics tls ciphers: %q", argMap["--metrics-tls-cipher-suites"]) + } +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tlsprofile/tlsprofile_test.go` around lines 80 - 97, Add a new unit test in tlsprofile_test.go analogous to TestCertManagerWebhookTLSArgs that directly exercises CertManagerOperandMetricsTLSArgs: construct a configv1.TLSProfileSpec with Ciphers and MinTLSVersion, call CertManagerOperandMetricsTLSArgs(spec), parse the returned args into a map like the existing test, and assert that the generated --tls-cipher-suites (or the operand-metrics-specific flag produced by CertManagerOperandMetricsTLSArgs) contains the expected normalized cipher name (e.g. TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); ensure the test name clearly references CertManagerOperandMetricsTLSArgs and mirrors the parsing/assertion style used in TestCertManagerWebhookTLSArgs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/tlsprofile/tlsprofile.go`:
- Around line 47-66: Both exported helpers CertManagerWebhookTLSArgs and
CertManagerOperandMetricsTLSArgs currently dereference the incoming spec and
will panic on a nil input; add a nil guard at the top of each function (e.g., if
spec == nil) and return an empty []string{} immediately to avoid a runtime panic
before calling joinIANACiphers(spec.Ciphers) or converting spec.MinTLSVersion to
string. Ensure this nil-check is placed before any use of spec so both functions
safely handle nil callers.
---
Nitpick comments:
In `@pkg/tlsprofile/tlsprofile_test.go`:
- Around line 80-97: Add a new unit test in tlsprofile_test.go analogous to
TestCertManagerWebhookTLSArgs that directly exercises
CertManagerOperandMetricsTLSArgs: construct a configv1.TLSProfileSpec with
Ciphers and MinTLSVersion, call CertManagerOperandMetricsTLSArgs(spec), parse
the returned args into a map like the existing test, and assert that the
generated --tls-cipher-suites (or the operand-metrics-specific flag produced by
CertManagerOperandMetricsTLSArgs) contains the expected normalized cipher name
(e.g. TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); ensure the test name clearly
references CertManagerOperandMetricsTLSArgs and mirrors the parsing/assertion
style used in TestCertManagerWebhookTLSArgs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 54a08f14-de28-4f16-b6fe-502a900f8a11
📒 Files selected for processing (2)
pkg/tlsprofile/tlsprofile.gopkg/tlsprofile/tlsprofile_test.go
|
@arun717 i wonder if we should take tlsAdherence into consideration after re-reading the EP. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/tlsprofile/tlsprofile_test.go (2)
75-77: ⚡ Quick winStrengthen custom-profile assertions beyond count only.
len(spec.Ciphers) == 2can pass even if cipher content/order or min TLS version is wrong. Assert exact values to catch regressions earlier.Proposed change
if len(spec.Ciphers) != 2 { t.Fatalf("cipher count: %d", len(spec.Ciphers)) } + if spec.Ciphers[0] != "TLS_AES_128_GCM_SHA256" || spec.Ciphers[1] != "ECDHE-RSA-AES128-GCM-SHA256" { + t.Fatalf("unexpected ciphers: %#v", spec.Ciphers) + } + if spec.MinTLSVersion != configv1.VersionTLS12 { + t.Fatalf("unexpected min TLS version: %q", spec.MinTLSVersion) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tlsprofile/tlsprofile_test.go` around lines 75 - 77, The test currently only checks len(spec.Ciphers) which misses content/order and other fields; update the test that inspects spec (e.g., spec.Ciphers and spec.MinTLSVersion) to assert exact expected values and order (compare the cipher slice to the expected slice and compare spec.MinTLSVersion to the expected constant) so the test fails on wrong ciphers or TLS version—locate the assertions around spec.Ciphers in tlsprofile_test.go and replace the length-only check with explicit equality checks for the slice contents and the MinTLSVersion field.
15-32: ⚡ Quick winIsolate potential global mutation to keep failures contained.
At Line 21, if deep-copy behavior regresses, this test can mutate shared
TLSProfilesstate and affect later tests. Add cleanup to restore the original value.Proposed change
func TestEffectiveSpec_builtinDeepCopiesCiphers(t *testing.T) { ref := configv1.TLSProfiles[configv1.TLSProfileIntermediateType] if len(ref.Ciphers) == 0 { t.Fatal("expected non-empty intermediate cipher list") } origFirst := ref.Ciphers[0] + t.Cleanup(func() { + ref.Ciphers[0] = origFirst + }) spec, err := EffectiveSpec(&configv1.TLSSecurityProfile{Type: configv1.TLSProfileIntermediateType}) if err != nil { t.Fatal(err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tlsprofile/tlsprofile_test.go` around lines 15 - 32, The test mutates a cipher in spec which may leak into global TLSProfiles (ref), so capture the original value (origFirst) then immediately schedule a cleanup that restores ref.Ciphers[0] = origFirst using defer so the global state is reset even on failures; locate the setup that captures origFirst and add a defer to reassign ref.Ciphers[0] back to origFirst, keeping the rest of the assertions (EffectiveSpec, spec/spec2) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/tlsprofile/tlsprofile_test.go`:
- Around line 75-77: The test currently only checks len(spec.Ciphers) which
misses content/order and other fields; update the test that inspects spec (e.g.,
spec.Ciphers and spec.MinTLSVersion) to assert exact expected values and order
(compare the cipher slice to the expected slice and compare spec.MinTLSVersion
to the expected constant) so the test fails on wrong ciphers or TLS
version—locate the assertions around spec.Ciphers in tlsprofile_test.go and
replace the length-only check with explicit equality checks for the slice
contents and the MinTLSVersion field.
- Around line 15-32: The test mutates a cipher in spec which may leak into
global TLSProfiles (ref), so capture the original value (origFirst) then
immediately schedule a cleanup that restores ref.Ciphers[0] = origFirst using
defer so the global state is reset even on failures; locate the setup that
captures origFirst and add a defer to reassign ref.Ciphers[0] back to origFirst,
keeping the rest of the assertions (EffectiveSpec, spec/spec2) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e6b9e61b-0839-459e-ab45-50a5f6c9970d
📒 Files selected for processing (2)
pkg/tlsprofile/tlsprofile.gopkg/tlsprofile/tlsprofile_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/tlsprofile/tlsprofile.go
180bb2c to
04d6f88
Compare
|
/retest |
|
/retest-required |
|
/test e2e-operator-tech-preview |
|
Hey @arun717 thanks for your patience :-) finally did a first pass, and left few comments.
That way it will be easier to do the subsequent review (vs 26 commits and 2k+ files ;) |
Add tlsprofile mapping, a deployment hook that reads apiserver/cluster TLSSecurityProfile and tlsAdherence, shared container arg helpers, RBAC to watch apiservers, and wire the hook into cert-manager deployments. Co-authored-by: Cursor <cursoragent@cursor.com>
Update openshift/api, client-go, and library-go, bump Kubernetes modules to v1.35.1, refresh vendor and bundle manifests, and adjust validation for upstream API changes. Co-authored-by: Cursor <cursoragent@cursor.com>
Add unit tests for tlsprofile, the deployment TLS hook, and shared container args helpers; update istiocsr and e2e tests for Kubernetes validation changes. Co-authored-by: Cursor <cursoragent@cursor.com>
fcaf2f1 to
8b80dfa
Compare
…ong with some PR comments addressed.
|
/retest-required |
|
/test e2e-operator-tech-preview |
|
/retest-required |
Vault BeforeEach setup chains TLS cert, Helm, and pod init waits that can exceed highTimeout on slow CI. Use a separate 30m context for setupVaultServer and configureVaultPKI while keeping highTimeout for per-test issuer work. Co-authored-by: Cursor <cursoragent@cursor.com>
5a173e7 to
2bd2f32
Compare
|
/test e2e-operator |
| // (OpenShift clusters). | ||
| func WithClusterTLSProfileFromAPIServer(apiServerInformer configinformersv1.APIServerInformer) func(*operatorv1.OperatorSpec, *appsv1.Deployment) error { | ||
| return func(_ *operatorv1.OperatorSpec, deployment *appsv1.Deployment) error { | ||
| if len(deployment.Spec.Template.Spec.Containers) != 1 { |
There was a problem hiding this comment.
Should we be enforcing this? One scenario I could think of where this would break is ServiceMesh configured clusters, where it could inject a sidecar and this would fail. Should we instead rely on the container names?
for _, name := range deployment.Spec.Template.Spec.Containers {
switch name {
case "cert-manager-controller", "cert-manager-cainjector", "cert-manager-webhook":
MergeContainerArgs()
}
}
There was a problem hiding this comment.
Thanks for raising this. It is a fair thing to think about for mesh-enabled clusters. This does not look like a Service Mesh issue in practice.
Deployment hooks run on the embedded manifest before apply, not on the live cluster object.
Mesh sidecars are injected at admission after apply, so reconcile still sees one container here.
The hook always patches Containers[0], so the len != 1 check guards that assumption. It fails fast if the manifest is empty or has multiple containers, rather than patching the wrong one or panicking.
That matches how the other cert-manager deployment hooks work (log level, proxy, overrides, credentials), and our bindata Deployments always ship a single operand container per Deployment.
Please do let me know if I have missed something.
There was a problem hiding this comment.
Yeah, this was miss from my end, didn't apply the static manifests logic we have.
But I still think it would be good to check against container names instead of hardcoding to keep it generic, as this would add to the maintenance if container count changes in future. But that's fine, it can be taken up when any such usecase comes up
There was a problem hiding this comment.
Sure, not making any change for now.
Defer EffectiveSpec until after the cluster TLS adherence gate so invalid profile settings do not fail reconciliation when cert-manager is exempt. Co-authored-by: Cursor <cursoragent@cursor.com>
When a managed ClusterRole or ClusterRoleBinding is deleted externally, reuse the name recorded in IstioCSR status instead of issuing a new GenerateName suffix so e2e recreation checks and clients stay consistent. Co-authored-by: Cursor <cursoragent@cursor.com>
23e0f06 to
ce89822
Compare
E2E TLS profile helpers import library-go directly; go mod tidy promotes it from indirect to direct so verify-deps stays clean in CI. Co-authored-by: Cursor <cursoragent@cursor.com>
Run withUnsupportedArgsOverrideHook after WithClusterTLSProfileFromAPIServer so break-glass TLS settings in spec.unsupportedConfigOverrides win over cluster-wide profile flags during TLSAdherence Tech Preview. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/test e2e-operator-tech-preview |
The vault-installer pod was rejected on clusters enforcing Pod Security restricted:latest because it ran with privileged=true. Helm install only needs API access via the service account, so use a restricted-compliant security context instead. Co-authored-by: Cursor <cursoragent@cursor.com>
TLS profile e2e now patches apiserver.config.openshift.io/cluster to StrictAllComponents with a Modern profile, verifies operand TLS args, and restores the original settings. Add TechPreview label so the test runs on TechPreview CI jobs where TLSAdherence is enabled. Co-authored-by: Cursor <cursoragent@cursor.com>
Also tidy test/go.mod after removing direct library-go import and gofmt deployment_tls_override_test.go for verify. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@arun717: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm Please check the comment for pre-merge validations. /label px-approved cc: @snarayan-redhat for docs-approved. Please help with the release notes. /hold for @mytreya-rh review. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: arun717, bharath-b-rh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Same description as [https://github.com//pull/409]
Removed the changes dependent on istio-csr upstream changes.
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests