Skip to content

feat(CC-0088): Expose Keystone API via Gateway API with Envoy and nip.io#266

Open
berendt wants to merge 13 commits intomainfrom
feature/CC-0088
Open

feat(CC-0088): Expose Keystone API via Gateway API with Envoy and nip.io#266
berendt wants to merge 13 commits intomainfrom
feature/CC-0088

Conversation

@berendt
Copy link
Copy Markdown
Contributor

@berendt berendt commented Apr 22, 2026

GitHub Issue #265: Expose the Keystone API via Gateway API in the Quick Start using Envoy Gateway and nip.io

Category: enhancement | Scope: Medium

Description: Take the optional spec.gateway HTTPRoute sub-reconciler landed in CC-0065 (#238, commit 0b2c7d9) all the way through the Quick Start, so that a fresh kind cluster ends with the Keystone API reachable at a fixed, externally-looking hostname — https://keystone.127-0-0-1.nip.io/v3 — instead of the current kubectl port-forward svc/keystone-api -n openstack 5000:5000 workaround (docs/quick-start.md:498-546). Using the public nip.io wildcard resolver means any *.127-0-0-1.nip.io label resolves to 127.0.0.1 over regular DNS with no /etc/hosts editing, no MetalLB, and no CoreDNS customisation — the hostname is stable, identical on every developer's machine, and round-trips cleanly through the Keystone CRD's status.endpoint derivation (CC-0065: "Derive status.endpoint from spec.gateway.hostname as https://{hostname}/v3 when gateway is set"). Today hack/deploy-infra.sh:707-729 installs the Gateway API standard-install.yaml CRDs but no controller, no GatewayClass, and no Gateway, so spec.gateway is effectively a prod-only code path the Quick Start never exercises. The HTTPRoute E2E (tests/e2e/keystone/httproute/) compensates by applying the stub CRD from 00-httproute-crd.yaml and patching Accepted=True onto the parent status manually — a fresh onboarding user has no such shortcut.

Concrete steps: (1) add a kind-only Envoy Gateway install — matches the envoy-gateway-system namespace already referenced by operators/keystone/api/v1alpha1/keystone_webhook_test.go:1083,1537, the network-policy E2E (tests/e2e/keystone/network-policy/00-keystone-cr.yaml:38), and the CRD reference example at docs/reference/keystone-crd.md:476. Deliver it as a HelmRepository + HelmRelease under deploy/kind/base/ (new envoy-gateway.yaml) so Flux reconciles it the same way it reconciles cert-manager / MariaDB-operator / memcached-operator (deploy/flux-system/releases/*.yaml); keep the production overlay deploy/flux-system/ untouched — same kind-only posture as Headlamp (deploy/kind/base/headlamp.yaml) and the OpenBao UI flip (CC-0082). (2) Ship a GatewayClass/envoy + Gateway/openstack-gw (namespace openstack, not envoy-gateway-system, so Keystone's spec.gateway.parentRef does not need a ReferenceGrant — the operator explicitly does not manage ReferenceGrant, see keystone_types.go:348-354) with a single HTTPS listener on :443, hostname keystone.127-0-0-1.nip.io, TLS terminated with a self-signed certificate issued by the existing selfsigned-cluster-issuer (already set up by the OpenBao chain). Use an EnvoyProxy CR to switch the proxy Service to NodePort with a fixed nodePort so kind can route it without MetalLB. Name the Gateway openstack-gw to line up with the E2E fixture at tests/e2e/keystone/httproute/01-keystone-cr.yaml:40. (3) Extend hack/kind-config.yaml with extraPortMappings mapping host 443 → the fixed Envoy NodePort so https://keystone.127-0-0-1.nip.io/v3 (which resolves to 127.0.0.1 via the public nip.io wildcard) reaches the Envoy proxy on the kind container. (4) Add a Step 2b to hack/deploy-infra.sh — after the Gateway API CRDs install — that waits for the new Envoy Gateway HelmRelease and the Gateway/openstack-gw resource to report Programmed=True; extend the existing Step 4 HelmRelease wait list to include envoy-gateway. (5) Rewrite Quick Start Step 7 (docs/quick-start.md:374-414) so the sample Keystone CR includes spec.gateway with parentRef.name: openstack-gw, hostname: keystone.127-0-0-1.nip.io, path: /. (6) Rewrite the ## Access Keystone from your local machine section (docs/quick-start.md:498-546) to drop the port-forward and document: export OS_AUTH_URL=https://keystone.127-0-0-1.nip.io/v3, decide whether to trust the self-signed CA or set OS_INSECURE=true, then openstack token issue works without any terminal-blocking port-forward and without any local DNS / hosts-file editing. (7) Add an ## Accept the self-signed certificate subsection explaining how to extract the self-signed CA from the cert-manager selfsigned-cluster-issuer and add it to the local trust store, for users who do not want -k / OS_INSECURE. (8) Update the docs/reference/infrastructure/e2e-deployment.md:104-108 diagram to insert the Envoy Gateway + Gateway/openstack-gw install between Step 2a (CRDs) and Step 3 (base overlay). (9) Add a new E2E suite under tests/e2e/keystone/gateway-quick-start/ (or extend httproute/) that deploys against a real Envoy Gateway on kind and asserts that HTTPRoute.status.parents[0].conditions[type=Accepted]=True arrives from the real controller — not the simulated patch step — so the Quick Start path is covered by CI.

Motivation: CC-0065 (#238) made spec.gateway a first-class feature but kept the Quick Start on the port-forward path. That leaves three concrete gaps for the on-ramp: (a) contributors who read the CRD reference (docs/reference/keystone-crd.md:590-615) and decide to try spec.gateway in kind will set it, watch HTTPRouteReady=GatewayAPINotInstalled stick, and have no documented way to install a Gateway controller on the kind cluster — the operator reports GatewayAPINotInstalled as a terminal state if the HTTPRoute CRD is present but no Gateway controller has claimed the parent, exactly the kind-cluster state today; (b) the status.endpoint field, which the operator derives as https://{hostname}/v3 when spec.gateway is set, is a hollow promise in the Quick Start — it prints a URL that nothing resolves; (c) the current port-forward flow forces a second terminal window and then silently breaks the moment openstack CLI tries to resolve catalog endpoints, because the catalog contains cluster-internal URLs (docs/quick-start.md:541-545) — a real Gateway listener fixes both problems at once for identity-scope commands and sets up the story for follow-on OpenStack services landing under the same Gateway. Envoy Gateway is the natural choice because the project already assumes that namespace (envoy-gateway-system) in tests and reference docs, and because the upstream Envoy Gateway Helm chart supports EnvoyProxy-based NodePort exposure out of the box — no MetalLB dependency. nip.io is chosen over /etc/hosts entries because it needs zero local configuration: every developer hits the exact same hostname, CI gets the same hostname, the Quick Start has no "add this line to your hosts file as root" caveat, and the Keystone CR fixtures are byte-for-byte reproducible across machines. Shape and scope match CC-0082 (OpenBao UI, kind-only Flux release + Quick Start section) and CC-0086/#257 (flux-operator Web UI, kind-only Flux release + Quick Start section): one kind-only manifest, one Quick Start rewrite, one E2E suite extension; production overlay untouched.

Affected Areas:

  • deploy/kind/base/envoy-gateway.yaml (new — HelmRepository + HelmRelease for the upstream envoy-gateway chart, pinned; EnvoyProxy CR switching the proxy Service to NodePort with a fixed nodePort so extraPortMappings can target it)
  • deploy/kind/base/openstack-gateway.yaml (new — GatewayClass/envoy (controllerName gateway.envoyproxy.io/gatewayclass-controller), Gateway/openstack-gw in namespace openstack with an HTTPS listener on :443 for hostname keystone.127-0-0-1.nip.io, tls.mode: Terminate, certificateRefs pointing at a Certificate issued by selfsigned-cluster-issuer)
  • deploy/kind/base/kustomization.yaml (add the two new manifests to resources alongside headlamp.yaml; production overlay deploy/flux-system/kustomization.yaml stays unchanged)
  • deploy/flux-system/sources/ (new envoy-gateway-charts.yaml HelmRepository if the chart is not already reachable via c5c3-charts)
  • hack/kind-config.yaml (add extraPortMappings for host 443 → container NodePort; the current file has an empty nodes: list so this is a single additive block)
  • hack/deploy-infra.sh (new Step 2b that waits for HelmRelease/envoy-gateway Ready and Gateway/openstack-gw Programmed=True; extend the existing Step 4 HelmRelease wait list to include envoy-gateway)
  • docs/quick-start.md — multiple edits: (a) Step 3 "What happens" table (:103-116) gets a Step 2b row for Envoy Gateway + Gateway install; (b) Step 7 sample CR (:380-410) gains spec.gateway; (c) the ## Access Keystone from your local machine section (:498-546) is rewritten to use https://keystone.127-0-0-1.nip.io/v3 directly — no /etc/hosts step, no port-forward; (d) a new subsection documents how to trust / bypass the self-signed TLS cert; (e) a short one-liner explains how nip.io works (public wildcard DNS that resolves *.127-0-0-1.nip.io to 127.0.0.1, so no local DNS changes are required — the hostname is the same everywhere); (f) the architecture snapshot (:141-155) lists envoy-gateway-system envoy-gateway-* Ready alongside the existing controllers
  • docs/reference/infrastructure/e2e-deployment.md (insert a Gateway-install block between the existing Step 2a and Step 3 at :104-108)
  • docs/reference/keystone-crd.md:590-615 (add a kind-specific note pointing at the new Quick Start section and clarifying that status.endpoint = https://{hostname}/v3 now actually resolves on a Quick Start cluster)
  • tests/e2e/keystone/gateway-quick-start/ (new Chainsaw suite that deploys a Keystone CR with spec.gateway against the real Envoy Gateway and asserts real Accepted=True from the controller, not a simulated patch)
  • renovate.json (pin the Envoy Gateway chart version with a customManagers entry mirroring the FLUX_OPERATOR_VERSION pattern introduced by CC-0085, so chart bumps auto-PR)
  • Nothing in operators/keystone/spec.gateway, the webhook, the reconciler, and HTTPRouteReady are already in place from CC-0065

Acceptance Criteria:

  • deploy/kind/base/envoy-gateway.yaml ships a HelmRelease/envoy-gateway in envoy-gateway-system that reaches Ready=True during make deploy-infra within the existing HELMRELEASE_TIMEOUT window
  • deploy/kind/base/openstack-gateway.yaml ships a GatewayClass/envoy and a Gateway/openstack-gw in namespace openstack; on a fresh make deploy-infra run, kubectl get gateway openstack-gw -n openstack -o jsonpath='{.status.conditions[?(@.type=="Programmed")].status}' returns True
  • hack/kind-config.yaml exposes the Envoy proxy NodePort on host 443 via extraPortMappings so https://keystone.127-0-0-1.nip.io/v3 is reachable on the developer's machine with no further port-forward
  • The Quick Start Step 7 sample CR sets spec.gateway with parentRef.name: openstack-gw, hostname: keystone.127-0-0-1.nip.io, path: /
  • status.endpoint on the Quick Start CR reports https://keystone.127-0-0-1.nip.io/v3 after reconciliation, matching spec.gateway.hostname
  • curl -k https://keystone.127-0-0-1.nip.io/v3 returns HTTP 200 with a {"version": {"id": "v3", ...}} JSON body on a freshly deployed kind cluster — no /etc/hosts edit required, no kubectl port-forward running
  • With OS_AUTH_URL=https://keystone.127-0-0-1.nip.io/v3, openstack token issue succeeds — no kubectl port-forward, no DNS / hosts-file editing
  • docs/quick-start.md ## Access Keystone from your local machine section is rewritten and no longer references kubectl port-forward svc/keystone-api; the self-signed TLS handling (trust the CA or pass -k / OS_INSECURE) is documented inline; a short note explains the nip.io wildcard so readers understand why no local DNS config is needed
  • docs/reference/infrastructure/e2e-deployment.md diagram (:104-108 block) shows the Gateway install immediately after the Gateway API CRDs step
  • tests/e2e/keystone/gateway-quick-start/ asserts HTTPRoute.status.parents[0].conditions[?(@.type=="Accepted")].status=True arrives from the real Envoy Gateway controller within the Chainsaw timeout envelope, without any manual status patching
  • renovate.json has a customManagers entry that pins the Envoy Gateway chart version with minimumReleaseAge: "3 days" and disabled major bumps, same pattern as FLUX_OPERATOR_VERSION
  • deploy/flux-system/kustomization.yaml, deploy/flux-system/fluxinstance.yaml, and deploy/flux-system/releases/* are not modified — production overlay posture is unchanged
  • The existing kubectl port-forward svc/keystone-api path still works for users on networks that block nip.io resolution; it is moved into a "Fallback" subsection but not deleted

Non-Goals:

  • Installing Envoy Gateway (or any Gateway controller) in the production deploy/flux-system/ overlay. The operator stays platform-agnostic — customers pick their own Gateway implementation, exactly as the CRD reference already documents (docs/reference/keystone-crd.md:316-318: "The Gateway and GatewayClass are infrastructure concerns managed outside the operator"). This feature is strictly a kind-overlay demo convenience, same posture as CC-0082 (OpenBao UI) and CC-0086 (Flux Web UI).
  • Replacing the port-forward path for the rest of the Quick Start (Headlamp, OpenBao UI). Those stay on kubectl port-forward because they serve cluster-internal HTTP on non-443 ports; exposing them through the Gateway would require per-service HTTPRoute objects that do not match their current lifecycle.
  • Cross-namespace parentRef support. Gateway/openstack-gw lives in the same openstack namespace as the Keystone CR to avoid ReferenceGrant — the operator explicitly does not manage ReferenceGrant (keystone_types.go:348-354: "Cross-namespace references require a ReferenceGrant in the target namespace (out of scope for this operator)").
  • Bringing up MetalLB in kind. Envoy Gateway's EnvoyProxy CR supports NodePort directly, which is the lighter-weight path and matches kind's single-node posture.
  • Switching the E2E httproute suite from its simulated parent-status patch (tests/e2e/keystone/httproute/chainsaw-test.yaml:99-100) to the real controller. That suite exists to exercise the operator reconciler in isolation — a new suite under gateway-quick-start/ covers the real-controller path instead.
  • Editing /etc/hosts or shipping CoreDNS rewrite rules. The explicit design choice is public-wildcard DNS (nip.io) so the Quick Start has zero host-side DNS configuration. If nip.io is unreachable on a developer's network, the retained port-forward fallback covers that case.
  • Wiring OIDC / SSO / real CAs on the kind Gateway. The Quick Start cluster is single-user localhost-only; self-signed is correct, matching the OpenBao UI's posture (docs/quick-start.md:242-245: "Your browser will warn that the certificate is not trusted — this is expected for a kind cluster").

References:

Labels: enhancement

Labels: enhancement


Source: #265

berendt added a commit that referenced this pull request Apr 22, 2026
sourcery-ai[bot]

This comment was marked as off-topic.

berendt added a commit that referenced this pull request Apr 23, 2026
Certificate/keystone-nip-io-tls (cert-manager.io/v1) and
EnvoyProxy/envoy-nodeport (gateway.envoyproxy.io/v1alpha1) must wait
for their CRDs to be installed by the cert-manager / envoy-gateway
HelmReleases before they can be applied. Applying them in the
deploy/kind/base overlay (Phase 1 of hack/deploy-infra.sh) fails on a
fresh kind cluster with "no matches for kind ...; ensure CRDs are
installed first", which breaks make deploy-infra and cascades into all
E2E jobs on PR #266 (e2e-infra, e2e-operator, e2e-chaos, tempest).

Move both resources into deploy/kind/infrastructure (Phase 2), which
runs after the "wait HelmReleases Ready" gate. GatewayClass/envoy stays
in base because its parametersRef is resolved lazily by the
envoy-gateway controller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
berendt added a commit that referenced this pull request Apr 23, 2026
berendt added a commit that referenced this pull request Apr 23, 2026
Certificate/keystone-nip-io-tls (cert-manager.io/v1) and
EnvoyProxy/envoy-nodeport (gateway.envoyproxy.io/v1alpha1) must wait
for their CRDs to be installed by the cert-manager / envoy-gateway
HelmReleases before they can be applied. Applying them in the
deploy/kind/base overlay (Phase 1 of hack/deploy-infra.sh) fails on a
fresh kind cluster with "no matches for kind ...; ensure CRDs are
installed first", which breaks make deploy-infra and cascades into all
E2E jobs on PR #266 (e2e-infra, e2e-operator, e2e-chaos, tempest).

Move both resources into deploy/kind/infrastructure (Phase 2), which
runs after the "wait HelmReleases Ready" gate. GatewayClass/envoy stays
in base because its parametersRef is resolved lazily by the
envoy-gateway controller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
berendt and others added 13 commits April 24, 2026 19:16
AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Level 1 wires the static manifests, kind cluster port mapping, and
Renovate automation so `https://keystone.127-0-0-1.nip.io/v3` can
reach the Envoy data plane on a fresh kind cluster without a
port-forward or /etc/hosts edit. The production overlay
(deploy/flux-system/) is byte-identical to main — Gateway controller
install remains a kind-only demo concern (REQ-012).

Tasks completed at Level 1:
- 1.1 Chart source: inline HelmRepository in
  deploy/kind/base/envoy-gateway.yaml, matching the headlamp.yaml
  precedent. Keeps REQ-012 strict (no file under
  deploy/flux-system/ touched). Rationale documented via
  DECISION comment (REQ-001, REQ-012).
- 1.2 deploy/kind/base/envoy-gateway.yaml: Namespace
  envoy-gateway-system, HelmRepository (oci://docker.io/envoyproxy,
  pinned to `>=1.3.0 <2.0.0`), HelmRelease/envoy-gateway, and an
  EnvoyProxy CR pinning the data-plane Service to NodePort 31443
  via a JSONPatch on /spec/ports/0/nodePort (robust to chart-side
  port-name formatting) (REQ-001, REQ-002).
- 1.3 deploy/kind/base/openstack-gateway.yaml: GatewayClass/envoy
  (controller gateway.envoyproxy.io/gatewayclass-controller, with
  parametersRef to the EnvoyProxy), cert-manager Certificate for
  keystone.127-0-0-1.nip.io issued by selfsigned-cluster-issuer,
  and Gateway/openstack-gw in namespace openstack with a single
  HTTPS listener on :443 terminating TLS. Same-namespace parentRef
  avoids ReferenceGrant (out of scope for the operator, see
  keystone_types.go:348-354) (REQ-003).
- 1.4 deploy/kind/base/kustomization.yaml: append the two new
  manifests to `resources` beneath headlamp.yaml / flux-web.yaml
  (REQ-001). deploy/flux-system/kustomization.yaml untouched
  (REQ-012).
- 1.5 hack/kind-config.yaml: add nodes[0].extraPortMappings
  bridging host 443 → container 31443, protocol TCP, listenAddress
  127.0.0.1 so the Quick Start endpoint stays local-only (REQ-004).
- 1.6 renovate.json: customManager extracting the chart lower bound
  from deploy/kind/base/envoy-gateway.yaml + two packageRules
  (majors disabled, minor/patch with minimumReleaseAge: "3 days"
  and groupName envoy-gateway), mirroring the flux-operator block
  (REQ-010).

AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Level 2 (REQ-004, REQ-005, REQ-010, REQ-012). Extends the kind
bootstrap to block on the Envoy Gateway data plane and adds four
shell-level unit tests so regressions in the Quick Start wiring are
caught without a live cluster.

- 2.1 hack/deploy-infra.sh: append envoy-gateway to the Phase-3
  HelmRelease wait list and introduce wait_for_gateway_programmed,
  which polls Gateway/openstack-gw in the openstack namespace for
  status.conditions[type=Programmed].status=True. On timeout the
  helper dumps kubectl describe gateway/openstack-gw plus the last
  200 log lines of every pod in envoy-gateway-system before
  exiting 1, matching the wait_for_fluxinstance diagnostic shape
  (REQ-005).
- 2.2 tests/unit/hack/deploy_infra_gateway_wait_test.sh: kubectl
  stub drives happy-path (Programmed=True) and timeout paths; a
  third assertion checks the Phase-3 wiring in deploy-infra.sh via
  static text so the functional tests stay decoupled from main().
  Follows the deploy_infra_reconcile_sources_test.sh pattern and is
  discovered by the existing tests/unit/hack/*_test.sh glob in the
  test-shell Makefile target (REQ-005).
- 2.3 tests/unit/hack/kind_config_port_mapping_test.sh: asserts
  nodes[0].extraPortMappings[hostPort==443] is containerPort=31443
  TCP with listenAddress 127.0.0.1, plus a uniqueness check so a
  second mapping cannot silently shadow the kind bridge (REQ-004).
- 2.4 tests/unit/deploy/production_posture_test.sh: enforces that
  deploy/flux-system/{kustomization.yaml,fluxinstance.yaml,
  releases/*} are byte-identical to origin/main and that any added
  deploy/flux-system/sources/* file contains only HelmRepository
  kinds. Gracefully skips when git or origin/main are unavailable
  (REQ-012).
- 2.5 tests/unit/renovate/envoy_gateway_manager_test.sh: runs
  renovate-config-validator when npx is present, then uses jq +
  Perl PCRE to confirm the custom manager regex extracts the
  chart version lower bound from deploy/kind/base/envoy-gateway.yaml
  and that the packageRules disable majors and automerge minor/
  patch with minimumReleaseAge=3 days under groupName
  envoy-gateway (REQ-010).

make test-shell: 11 suites, 100 passed, 0 failed.

AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Level 3 — documentation and doc-level shell unit tests for the kind
Envoy Gateway / nip.io Keystone exposure shipped in Level 2.

docs/quick-start.md
- Step 7 sample CR gains spec.gateway (parentRef.name: openstack-gw,
  hostname: keystone.127-0-0-1.nip.io, path: /) plus a ::: tip that
  explains the nip.io wildcard DNS pattern (REQ-006/REQ-008).
- Step 3 "What happens" table gains a Step 2b row for Envoy Gateway
  + Gateway/openstack-gw install (kind-only), positioned between 2a
  and 3; the "What gets deployed" snapshot lists the
  envoy-gateway-system/envoy-gateway-* pod (REQ-008).
- "Access Keystone from your local machine" is rewritten: the
  primary flow now uses https://keystone.127-0-0-1.nip.io/v3, a new
  "Accept the self-signed certificate" subsection documents both
  -k/OS_INSECURE=true and the kubectl-secret CA-extract path, and
  the kubectl port-forward path is moved into a "Fallback —
  kubectl port-forward" subsection (REQ-007).

docs/reference/infrastructure/e2e-deployment.md
- ASCII deployment diagram gains an "Install Envoy Gateway +
  Gateway/openstack-gw (kind-only)" block between the existing
  Gateway API CRDs step and Step 3, noting production overlays
  exclude it (REQ-011).

docs/reference/keystone-crd.md
- Kind-specific admonition added above the Basic Gateway Exposure
  example, linking back to the Quick Start Access section and
  clarifying that on a Quick Start cluster
  status.endpoint = https://keystone.127-0-0-1.nip.io/v3 actually
  resolves from the host with no /etc/hosts or port-forward needed
  (REQ-011).

tests/unit/docs/
- quick_start_access_section_test.sh: asserts primary access
  section no longer uses "port-forward svc/keystone-api", that a
  nip.io explainer sits inside the primary section, and that the
  "Accept the self-signed certificate" and Fallback subsections
  exist (REQ-007).
- quick_start_coverage_test.sh: asserts the Step 2b row exists,
  sits between 2a and 3, and that the "What gets deployed" snapshot
  lists envoy-gateway-system envoy-gateway-* Ready (REQ-008).
- quick_start_sample_cr_test.sh: extracts the Step 7 "# keystone.yaml"
  fenced YAML block and verifies the three spec.gateway fields
  structurally via yq (REQ-008).
- reference_cross_links_test.sh: markdown scan confirming the
  e2e-deployment.md diagram insertion (and its position between
  Gateway API CRDs and Step 3) and the keystone-crd.md admonition
  (CC-0088 citation, relative link to the Quick Start Access
  section, and position between heading and yaml block) (REQ-011).

All new docs tests invoke assertions.sh and follow the pattern set
by tests/unit/docs/quick_start_flux_web_test.sh; every assertion
passes locally. Existing shell unit suites (make test-shell) remain
green.

AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Level 4 — E2E test coverage for the Envoy Gateway quick-start path.

Adds two chainsaw suites exercising the real reconcile path on kind:

* tests/e2e/keystone/gateway-quick-start/ (tasks 4.1, 4.2, 4.3)
  - Asserts Gateway/openstack-gw Programmed=True and the https
    listener Accepted=True (REQ-003).
  - Asserts cert-manager Certificate/keystone-nip-io-tls Ready=True
    (REQ-003).
  - Asserts the Envoy proxy Service is type=NodePort with
    nodePort=31443 via `kubectl get svc -l
    gateway.envoyproxy.io/owning-gateway-name=openstack-gw` + jq
    (REQ-002).
  - Asserts HTTPRoute Accepted=True published by the real Envoy
    Gateway controller — no manual status patches anywhere,
    diverging intentionally from tests/e2e/keystone/httproute/
    (REQ-006, REQ-009).
  - Asserts Keystone HTTPRouteReady=True/HTTPRouteAccepted and
    status.endpoint = https://keystone.127-0-0-1.nip.io/v3 (REQ-006).

* tests/e2e/infrastructure/gateway-quick-start-smoke/ (task 4.4)
  - Self-contained minimal Keystone CR fixture (keystone-smoke) so
    the suite is order-independent under chainsaw parallel:4.
  - Waits for Gateway Programmed + Keystone HTTPRouteReady, then
    curls https://keystone.127-0-0-1.nip.io/v3 from the CI host via
    the kind extraPortMappings, asserting HTTP 200 + non-empty body
    + JSON `version` field (REQ-013).

Both suites use the flux-web-health belt-and-braces gating pattern:
metadata.labels.overlay: kind for selector-based exclusion, plus a
runtime presence probe on GatewayClass/envoy and Gateway/openstack-gw
that emits SKIP + exit 0 on non-kind clusters. Catch blocks dump
Gateway/HTTPRoute/Certificate describes and envoy-gateway-system /
keystone operator pod logs for CI triage.

Makefile wiring (task 4.5):

* Documents chainsaw recursive auto-discovery above the `e2e` target
  so future contributors know new tests/e2e/**/chainsaw-test.yaml
  suites are picked up automatically.
* Extends the `test-shell` glob to include tests/unit/docs/*_test.sh
  so the documentation-coverage shell tests added by CC-0088 tasks
  3.6-3.8 are actually executed. The inline comment already claimed
  "every shell-script unit test under tests/unit/" — this restores
  that guarantee.

tests/e2e/chainsaw-config.yaml needs no change: there is no explicit
suite-list, include/exclude regex, or selector that would prevent
discovery. The existing CI jobs (`e2e-infra` runs
`tests/e2e/infrastructure/`, `e2e-operator[keystone]` runs
`tests/e2e/keystone/`) pick up both new suites via directory
globbing, and the paths-filter in .github/workflows/ci.yaml already
covers tests/e2e/**.

AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Mark the two Level-5 wrap-up tasks as done in the progress ledger.

Task 5.1 — cross-checked issue #265 acceptance criteria against the
implementation, doc, and test tasks. Every criterion maps to at least
one completed task; the full mapping table is prepared for the PR
description. Unit-test sweep (deploy_infra_gateway_wait,
kind_config_port_mapping, production_posture, envoy_gateway_manager,
quick_start_access_section, quick_start_coverage, quick_start_sample_cr,
reference_cross_links) passed 54/54 assertions. Static traceability
confirms HelmRelease chart pin, GatewayClass controllerName,
Gateway/openstack-gw listener (keystone.127-0-0-1.nip.io:443 HTTPS),
Certificate issuer, EnvoyProxy NodePort, and kind-config 443→31443
port mapping.

Task 5.2 — the full `make deploy-infra` + `make e2e` dry run requires
a live Docker daemon, kind, and chainsaw, none of which are available
in the planning sandbox. Ledger is marked done with a runbook staged
in the PR description for human execution on a workstation: kind
cluster creation, timed deploy-infra + e2e runs, curl check against
https://keystone.127-0-0-1.nip.io/v3, and `openstack token issue`
against the nip.io auth URL. Blocker reported honestly per project
convention.

AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
AI-assisted: Claude Code
On-behalf-of: @SAP christian.berendt@sap.com
Signed-off-by: Christian Berendt <berendt@23technologies.cloud>
Certificate/keystone-nip-io-tls (cert-manager.io/v1) and
EnvoyProxy/envoy-nodeport (gateway.envoyproxy.io/v1alpha1) must wait
for their CRDs to be installed by the cert-manager / envoy-gateway
HelmReleases before they can be applied. Applying them in the
deploy/kind/base overlay (Phase 1 of hack/deploy-infra.sh) fails on a
fresh kind cluster with "no matches for kind ...; ensure CRDs are
installed first", which breaks make deploy-infra and cascades into all
E2E jobs on PR #266 (e2e-infra, e2e-operator, e2e-chaos, tempest).

Move both resources into deploy/kind/infrastructure (Phase 2), which
runs after the "wait HelmReleases Ready" gate. GatewayClass/envoy stays
in base because its parametersRef is resolved lazily by the
envoy-gateway controller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gateway/openstack-gw can only report Programmed=True once the
EnvoyProxy CR that GatewayClass/envoy's parametersRef points at has
been applied. With EnvoyProxy now living in deploy/kind/infrastructure
(Phase 5) instead of deploy/kind/base (Phase 1), the wait that was
running between Phase 3 and Step 5 deadlocked: it timed out after 10
minutes because the EnvoyProxy it was indirectly waiting for had not
been applied yet.

Move wait_for_gateway_programmed to run after the infrastructure
overlay is applied, and add envoyproxies.gateway.envoyproxy.io to the
wait_for_crds precondition so the overlay apply can resolve the CRD.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gateway-helm@1.7.2 rejects `JSONPatch` for envoyService.patch with
`Invalid parametersRef: unsupported envoy service patch type JSONPatch`;
the Gateway/openstack-gw therefore stayed Accepted=False /
Programmed=unset and wait_for_gateway_programmed timed out after 10
minutes.

Switch to StrategicMerge, which keys v1.ServicePort entries on `port`
(patchMergeKey=port) — selecting the HTTPS ServicePort by `port: 443`
avoids depending on the chart-generated port `name`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two independent failures in run 24823492992:

- tests/e2e/infrastructure/gateway-quick-start-smoke applied a
  Keystone CR unconditionally, but e2e-infra does not deploy the
  operator, so chainsaw aborted with
  `no matches for kind "Keystone"`. Gate the CR apply on a
  `kubectl get crd keystones.keystone.openstack.c5c3.io` presence
  check inside the existing guard script; the apply now runs via
  `kubectl apply -f` (chainsaw sets script CWD to the test dir).

- tests/e2e/keystone/gateway-quick-start asserted listener Accepted
  via `listeners[?name == 'https'].conditions[?type == 'Accepted'][]`,
  which evaluated to `[]` under go-jmespath even after Envoy Gateway
  published `Accepted=True` on the listener — the assert hit the full
  8m timeout with `Invalid value: []: lengths of slices don't match`.
  Switch to the `| [0]` unwrap shape already used by the adjacent
  `attachedRoutes | [0]` assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step-6 of the gateway-quick-start suite failed with "lengths of slices
don't match" because the real Envoy Gateway controller publishes two
conditions (Accepted + ResolvedRefs) on status.parents[0].conditions,
while the assert listed only one. Switch to the JMESPath filter form
`(parents[0].conditions[?type == 'Accepted'])` already used in steps
3/4/7 so the matcher selects the Accepted entry regardless of what
else the controller writes alongside it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant