feat(CC-0090): Add NetworkPolicy for operator pod egress restriction#268
feat(CC-0090): Add NetworkPolicy for operator pod egress restriction#268
Conversation
WARNING (2)W-001: Chart.yaml artifacthub.io/changes not updated for CC-0090 and chart version not bumpedFile: Problem: CC-0090 introduces a significant new chart feature (opt-in NetworkPolicy for operator pod hardening) and a new top-level values key ( Code: Action Required: Add a new changelog entry for CC-0090 to Suggested Fix: W-002: E2E test cannot exercise NetworkPolicy enforcement on kindnet CI clusterFile: Problem: The CI kind cluster uses default kindnet, which does not enforce NetworkPolicy. The E2E test only asserts that (a) the NetworkPolicy resource is created and (b) a Keystone CR reaches Ready=True. On kindnet these pass vacuously with respect to enforcement — the policy is rendered but has no effect, so Code: Action Required: Choose between: (A) add an explicit assertion in Step 4 that the keystone-operator-netpol pod is the one that transitioned the CR to Ready (e.g. scrape operator logs for the CR UID); (B) accept the limitation and add an inline comment at the top of chainsaw-test.yaml stating the test validates chart rendering/Helm install, not NetworkPolicy enforcement (requires Calico/Cilium); or (C) file a follow-up ticket to provision a Calico-enabled kind cluster in a dedicated CI job for enforcement tests. Suggested Fix: INFO (2)I-001: PR description 'Affected Areas' references unmodified reconciler fileFile: Problem: The PR description lists Code: Action Required: Update the PR description's 'Affected Areas' section to reflect the actual chart-level scope: operators/keystone/helm/keystone-operator/templates/networkpolicy.yaml, tests/, values.{yaml,schema.json}, and tests/e2e/keystone-operator/network-policy-egress/. Suggested Fix: I-002: IPv6 CIDR schema pattern is structurally laxFile: Problem: The IPv4 half of the CIDR regex is rigorous (octets 0-255, prefix 0-32), but the IPv6 half is Code: Action Required: Consider tightening the IPv6 half of the pattern in a future change to reject malformed forms (too many groups, lone colon, etc.). Not a defect; noted for future tightening. Suggested Fix: SummaryHigh-quality PR with a well-reasoned chart-level design (instead of per-CR reconciler), thorough schema and helm-unittest coverage with comprehensive fail-closed defense-in-depth guards, and exemplary documentation (reference doc, how-to, cross-references, sidebar updates). The most substantive gap is that the Chainsaw E2E test cannot exercise actual NetworkPolicy enforcement on kindnet, making it closer to a smoke test — this matches CC-0039 precedent and is acknowledged in the how-to, but the test itself would benefit from explicit comment or a CNI follow-up. Minor cleanups remain: the Chart.yaml
Important Recommendation: Merge after applying the auto-fix for Chart.yaml (changelog annotation + version bump) and updating the PR description. The E2E-enforcement gap and IPv6 schema laxity are acceptable as-is given documented intent and precedent, but should be tracked as follow-ups. |
a557d5d to
1b81b06
Compare
1b81b06 to
0fc9905
Compare
AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Introduce the configuration surface for the operator-pod
NetworkPolicy (Level 1). Template rendering, fail-closed
guards, unit tests, E2E tests, and docs arrive in later
levels.
Why: opt-in NetworkPolicy hardening requires a schema-
validated values contract before the template can be safely
authored. Validating CIDRs, ports, and boolean types at
helm-install time turns malformed input into a clear error
instead of a rejected Kubernetes manifest later.
Tasks completed at Level 1:
- 1.1 values.yaml: add networkPolicy block with
enabled=false default, empty kubeApiServer.cidrs/ports,
DNS defaults (kube-system + k8s-app=kube-dns, enabled),
empty allowMetricsFrom, empty webhookClients.cidrs
override. Kind defaults documented as comments, not
applied (REQ-001, REQ-005).
- 1.2 values.schema.json: add cidr and stringMap
definitions; add networkPolicy object with strict
additionalProperties=false, boolean enabled, CIDR-pattern
validated arrays, integer ports bounded to [1, 65535],
stringMap-typed selector maps, and NetworkPolicyPeer array
for allowMetricsFrom. Restructure the top-level
conditional into allOf so the existing rbac/webhook
constraint is preserved alongside a new fail-closed
conditional that requires non-empty kubeApiServer.cidrs
and kubeApiServer.ports whenever networkPolicy.enabled is
true (REQ-005, REQ-006).
Defaults preserve the opt-in invariant: with no overrides,
values.yaml validates cleanly and no NetworkPolicy will
render. Setting enabled=true without populating both CIDRs
and ports is rejected at schema time — defense-in-depth
alongside the template-level {{ fail }} guard planned for
Level 2.
AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Render a default-deny NetworkPolicy for the keystone-operator pod when
networkPolicy.enabled=true, opening only the explicit rules required for
the operator to reconcile Keystone CRs. Default-off preserves upgrade
compatibility for clusters whose kube-apiserver CIDR or DNS layout does
not match the defaults.
Rules rendered (CC-0090, Level 2):
- Egress to the configured kube-apiserver CIDRs/ports as a single rule
(N CIDRs x M ports via NetworkPolicy cartesian semantics, REQ-004)
- Egress UDP+TCP/53 to the configurable DNS peer, gated by
networkPolicy.dns.enabled (REQ-003)
- Ingress TCP/9443 for the admission webhook from webhookClients.cidrs,
falling back to kubeApiServer.cidrs when unset (REQ-007)
- Ingress on metrics.port from networkPolicy.allowMetricsFrom peers,
rendered verbatim; skipped when empty (REQ-008)
Fail-closed guards (REQ-006) mirror the defensive check at
internal/controller/reconcile_networkpolicy.go:61-63: when enabled=true
with empty kubeApiServer.cidrs or empty kubeApiServer.ports, the
template aborts via {{ fail }} rather than render an unsafe rule. The
JSON schema already enforces minItems: 1 under allOf, but schema
validation can be bypassed (--skip-schema-validation) so the template
must defend itself.
Health probes on 8081 are intentionally not covered: kubelet calls
probes from the host network namespace and standard CNIs (Calico,
Cilium, Antrea) do not evaluate NetworkPolicy for that traffic
(REQ-009). A template comment documents this decision.
Level 2 tasks:
- 2.1 Scaffold + metadata/labels + podSelector + policyTypes
- 2.2 Fail-closed guards on empty cidrs/ports
- 2.3 DNS egress rule
- 2.4 kube-apiserver egress rule (single combined rule)
- 2.5 Webhook ingress rule with webhookClients fallback
- 2.6 Metrics ingress rule
- 2.7 Health-probe documentation comment
Verified with helm template against real helm binary: default-off
emits no NetworkPolicy; enabled+guards trigger on empty input;
enabled+valid input produces the expected YAML across DNS on/off,
webhook on/off, single- and multi-CIDR/port, and custom webhookClients
scenarios. helm lint is clean.
AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Author tests/networkpolicy_test.yaml covering the rendered NetworkPolicy shape, fail-closed guards, and absence of the health-probe ingress rule, and extend tests/schema_validation_test.yaml with networkPolicy schema constraints. Level 3 tasks: - 3.1 default-off count=0, enabled-on count=1 + labels, policyTypes equality (REQ-001, REQ-002) - 3.2 DNS rule: default selectors, overrides, dns.enabled=false drops the rule (REQ-003) - 3.3 kube-apiserver rule: single CIDR+port, multiple CIDRs, multiple ports — one rule combining the lists (REQ-004) - 3.4 Webhook ingress (enabled/disabled, custom webhookClients) and metrics ingress (empty, one peer, multiple peers, configurable port) (REQ-007, REQ-008) - 3.5 Negative paths: empty cidrs/ports trigger failedTemplate; no ingress rule for port 8081 is ever rendered (REQ-006, REQ-009) - 3.6 Schema: invalid CIDR, CIDR with /33 prefix, port>65535, port<1, non-boolean enabled, unknown property, and empty cidrs/ports under the enabled=true conditional are all rejected; a fully configured networkPolicy is accepted (REQ-005) AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Add a Chainsaw E2E test under tests/e2e/keystone-operator/ that installs the keystone-operator Helm chart with networkPolicy.enabled=true (kind CIDR 10.96.0.1/32:6443), asserts the rendered NetworkPolicy is present in the release namespace, and verifies a minimal Keystone CR reconciles to Ready=True under the restricted egress policy. A catch block dumps current+previous operator logs, the NetworkPolicy YAML, CR status, and namespace events to make CI egress regressions debuggable without a local reproduction (REQ-010, REQ-011). The new tests/e2e/<operator>-operator/ directory is wired into the existing e2e-operator job by extending the chainsaw invocation to include it when present, keeping chart-level tests separate from the per-CR tests under tests/e2e/<operator>/. matrix.operator is piped through an OPERATOR env var to satisfy the workflow-injection guardrail. Operators without a chart-level test directory are unaffected. Completes Level 4 tasks 4.1, 4.2, and 4.3 of CC-0090. AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Level 5 — documentation for the chart-level NetworkPolicy that hardens the keystone-operator pod (CC-0090). - 5.1 Add docs/reference/keystone-operator-networkpolicy.md with overview, default-off rationale, egress/ingress rule tables (direction/protocol/ports/peer/values-key/default/gated-by), kind values snippet, fail-closed guard explanation, and link to values.schema.json (REQ-012). - 5.2 Add docs/how-to/enable-keystone-operator-networkpolicy.md with prerequisites (CNI enforcement), numbered enablement steps (gather apiserver CIDR, update values, helm upgrade), verification (describe networkpolicy, reconcile check, chainsaw E2E), and troubleshooting for reconcile timeouts, webhook TLS, DNS, metrics scraping, and fail-closed render errors (REQ-013). - 5.3 Wire the new reference and how-to into the VitePress sidebar and add a top-of-doc back-link from the Keystone Reconciler reference that distinguishes CC-0039 (per-CR NetworkPolicy) from CC-0090 (operator-pod NetworkPolicy). Decision comment notes that docs/index.md is a hero page without a reference list, so the sidebar is the functional equivalent of "the reference section" (REQ-012). VitePress build passes; all internal anchors resolve. AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Addresses W-001 (Chart.yaml version + changelog), W-002 (chainsaw scope comment), and I-002 (schema docstring). - operators/keystone/helm/keystone-operator/Chart.yaml: Bumped chart version from 0.2.0 to 0.3.0 to reflect the new top-level `networkPolicy` key, and added an `artifacthub.io/changes` entry advertising the opt-in NetworkPolicy feature to Artifact Hub consumers (W-001). - operators/keystone/helm/keystone-operator/values.schema.json: Expanded the `cidr` definition description to explicitly document that the IPv6 half of the regex is intentionally loose — it does not enforce RFC 4291 group counts or `::` rules — and that final IPv6 validation is delegated to kubectl/apiserver admission, so readers understand the schema's stated typo-catching goal is only partially achieved for IPv6 (I-002). - tests/e2e/keystone-operator/network-policy-egress/chainsaw-test.yaml: Added a prominent "SCOPE CAVEAT" comment block at the top of the test clarifying that kindnet silently ignores NetworkPolicy, so this test validates only chart rendering and operator reconciliation under an unenforced policy rather than actual packet-level egress enforcement; also called out the Service-VIP DNAT issue on Calico/Cilium and the leader-election ambiguity introduced by running a second operator release alongside the primary one, and noted a Calico-enabled kind CI job as the follow-up that would make this test assert on blocked egress. Updated a downstream comment to reference the "(unenforced) egress policy" for consistency (W-002). AI-assisted: Claude Code On-behalf-of: @SAP christian.berendt@sap.com Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
0fc9905 to
86e848a
Compare
GitHub Issue #127: Add NetworkPolicy for operator pod egress restriction
Category: security | Scope: Small
Description: The NetworkPolicy sub-reconciler creates policies for Keystone API pods but not for the operator Deployment itself. Add a NetworkPolicy that restricts the operator pod's egress to only the Kubernetes API server and the namespaces it needs to communicate with.
Motivation: The operator pod currently has unrestricted network egress. If compromised, it could be used to reach arbitrary destinations. Restricting egress to only necessary endpoints reduces the blast radius.
Affected Areas (chart-level, opt-in via
networkPolicy.enabled=true— CC-0090):operators/keystone/helm/keystone-operator/Chart.yaml— version bump 0.2.0 → 0.3.0 + changelog annotationoperators/keystone/helm/keystone-operator/templates/networkpolicy.yaml— new NetworkPolicy template (REQ-001 — REQ-008)operators/keystone/helm/keystone-operator/values.yaml— new top-levelnetworkPolicykey with kind defaults commented out (REQ-005)operators/keystone/helm/keystone-operator/values.schema.json— CIDR/port validation andenabled=true⇒kubeApiServer.{cidrs,ports}required fail-closed clause (REQ-005, REQ-006)operators/keystone/helm/keystone-operator/tests/networkpolicy_test.yaml— helm-unittest coverage for all render branchesoperators/keystone/helm/keystone-operator/tests/schema_validation_test.yaml— schema-level validation cases for the new keytests/e2e/keystone-operator/network-policy-egress/— chainsaw E2E installing a second Helm release withnetworkPolicy.enabled=true(REQ-010, REQ-011); note: kindnet does not enforce NetworkPolicy, so this covers chart rendering + install-under-policy only, not packet-level enforcementtests/e2e-chaos/operator-pod-crash/chainsaw-test.yaml— adjusted to coexist with the new restricted releasedocs/reference/keystone-operator-networkpolicy.md,docs/how-to/enable-keystone-operator-networkpolicy.md,docs/reference/keystone-reconciler.md,docs/.vitepress/config.ts— reference + how-to docs and nav wiring (REQ-012, REQ-013)Note: the per-CR reconciler
operators/keystone/internal/controller/reconcile_networkpolicy.gois not modified — this PR hardens the operator pod itself via a chart-level NetworkPolicy, which is orthogonal to the per-Keystone-CR NetworkPolicies that reconciler continues to manage.Acceptance Criteria:
Source: #127