Backend-First IDP

Architecting a Production-Ready IDP: Argo CD, Crossplane & OPA in Practice

Reference architecture for KubeCon EU 2026 — Platform Engineering Zero Day.

Portal-first IDPs fail at scale. Build the automation, guardrails, and GitOps pipelines first. The portal is optional.

Architecture

  Developer           GitOps            Admission          Platform API         Cloud
  ┌──────────┐      ┌──────────┐      ┌───────────┐      ┌─────────────┐      ┌───────┐
  │  9-line  │─git─▶│  ArgoCD  │─sync▶│  Kyverno  │─ok──▶│  Crossplane │─────▶│  AWS  │
  │  claim   │ push │  v3 App  │      │  CEL      │      │  v2 XRD +   │      │  GCP  │
  │          │      │  Sets    │      │  Policies │      │  Pipeline   │      │ Azure │
  └──────────┘      └──────────┘      └───────────┘      │ Compositions│      └───────┘
                                            │             └─────────────┘
                                            │                    │
                                      ┌─────▼─────┐      ┌──────▼──────┐
                                      │  Policy   │      │   Shadow    │
                                      │ Promotion │      │   Metrics   │
                                      │ dev→prod  │      │  (runtime)  │
                                      └───────────┘      └─────────────┘

A developer writes a 9-line claim. ArgoCD syncs it. Kyverno validates against CEL policies (region restrictions, size caps, naming, HA, backups). Crossplane provisions the right cloud resources. Shadow Metrics evaluate whether the valid configuration is actually correct for the workload. No portal needed.

What's in This Repo

7 XRDs — Database, Cache, Message Queue, Object Storage, CDN, DNS, Namespace
21 Compositions — 7 resource types × 3 clouds (AWS, GCP, Azure)
6 Kyverno Policies — Region enforcement (PCI-DSS), size caps, labels, naming, backup retention, HA
Policy Promotion Pipeline — dev (Audit) → staging (mixed) → production (Enforce)
12 Teams, 109 Claims — Realistic "100+ service environment"
Shadow Metrics — CRD + 4 rules closing the semantic gap between "valid" and "correct"
Full Observability — OpenTelemetry, Prometheus, OpenCost, 5 Grafana dashboards
Composition Drift Detection — CronJob + Prometheus alerting
External Secrets Operator — ClusterSecretStores for AWS/GCP/Azure
One-Command Bootstrap — ./bootstrap/install.sh --provider aws

See docs/architecture.md for the full architecture breakdown.

Quick Start

# Clone the repo
git clone git@github.com:peopleforrester/backend-first-idp.git
cd backend-first-idp

# Preview what would be installed (dry run)
./bootstrap/install.sh --provider aws --dry-run

# Bootstrap the full platform
./bootstrap/install.sh --provider aws    # or gcp, or azure

# Lightweight install (skip observability stack)
./bootstrap/install.sh --provider aws --skip-observability

# Submit a database claim
kubectl apply -f golden-path/examples/claim-database.yaml

# Watch it provision
kubectl get databaseinstanceclaim -w

# Try a full service (DB + cache + queue + storage)
kubectl apply -f golden-path/examples/claim-full-service.yaml

# See what happens when you break the rules
kubectl apply -f golden-path/examples/claim-database-WILL-FAIL.yaml

Repository Structure

backend-first-idp/
├── platform-api/
│   ├── xrds/                    # 7 CompositeResourceDefinitions
│   ├── compositions/            # 21 cloud-specific implementations
│   │   ├── aws/                 #   RDS, ElastiCache, SQS, S3, CloudFront, Route53, NS
│   │   ├── gcp/                 #   Cloud SQL, Memorystore, Pub/Sub, GCS, Cloud CDN, DNS, NS
│   │   └── azure/               #   FlexibleServer, Redis Cache, Service Bus, Blob, Front Door, DNS, NS
│   ├── shadow-metrics/          # ShadowMetricRule CRD + 4 runtime validation rules
│   └── drift-detection/         # Composition drift CronJob + Prometheus alerting
├── policies/
│   └── kyverno/
│       ├── cluster-policies/    # 6 CEL policies (region, size, labels, naming, backup, HA)
│       ├── policy-exceptions/   # Platform team overrides
│       ├── policy-tests/        # Pass/fail test resources per policy
│       └── promotion/           # dev → staging → production overlays
├── gitops/
│   ├── argocd/                  # 4 ApplicationSets + team RBAC
│   └── kustomize/               # Base + cloud overlays + environment overlays
├── golden-path/
│   ├── examples/                # 6 claim examples (working, failing, shadow metric)
│   └── templates/               # Service scaffold with all 7 resource types
├── teams/                       # 12 teams, 109 claims (generated)
├── secrets/
│   └── eso/                     # ClusterSecretStores + ExternalSecret templates
├── observability/
│   ├── opentelemetry/           # Collector agents + gateway + instrumentation
│   ├── prometheus/              # Rules, ServiceMonitors, Helm values
│   ├── opencost/                # Cost allocation by team label
│   └── grafana/dashboards/      # 5 JSON dashboards
├── bootstrap/
│   ├── install.sh               # 12-step one-command setup
│   └── providers/               # AWS (IRSA), GCP (WI), Azure (OIDC)
├── scripts/                     # Team claim generator
├── tests/                       # 10 test suites
├── docs/                        # Architecture, Shadow Metrics, Kyverno, drift detection
└── DEMO.md                      # On-stage walkthrough (6 beats, ~10 min)

Multi-Cloud Support

The same claim works across all three clouds:

Field	AWS	GCP	Azure
eu-west-1	eu-west-1	europe-west1	westeurope
eu-central-1	eu-central-1	europe-west3	germanywestcentral
us-east-1	us-east-1	us-east1	eastus
us-west-2	us-west-2	us-west1	westus2

Size	AWS (DB)	GCP (DB)	Azure (DB)
small	db.t4g.medium	db-custom-2-4096	B_Standard_B2s
medium	db.t4g.large	db-custom-4-8192	GP_Standard_D2ds_v4
large	db.r6g.xlarge	db-custom-8-32768	GP_Standard_D4ds_v4

The Golden Path

What developers interact with — a single claim:

apiVersion: platform.kubecon.io/v1alpha1
kind: DatabaseInstanceClaim
metadata:
  name: checkout-db
  namespace: checkout
spec:
  size: small
  region: eu-west-1
  team: checkout

Nine lines. That's the entire developer interface. Everything else is platform.

Policy Enforcement

6 Kyverno CEL policies enforce team-level guardrails at admission time:

Policy	What It Enforces	Severity
Region enforcement	checkout/payments EU-only (PCI-DSS)	High
Size caps	checkout/analytics capped at medium	Medium
Required labels	All claims must have `team` label	Medium
Naming conventions	Names must start with team name	Low
Backup retention	Prod DBs need ≥7 day backups	High
HA enforcement	Prod DBs must enable HA	High

Policies are promoted through environments: dev (Audit) → staging (mixed) → production (Enforce).

Shadow Metrics — Closing the Semantic Gap

Policies validate what is allowed. Shadow Metrics validate what makes sense.

A claim that passes every Kyverno policy can still be wrong for the workload. Shadow Metrics evaluate runtime data (traffic volume, latency percentiles, utilization) and annotate claims with risk signals:

Rule	What It Checks	Signal
Database sizing	Is the DB size right for traffic?	Request rate
Region latency	Is the region optimal for users?	P95 latency
Cost efficiency	Is the resource over-provisioned?	CPU utilization
HA requirement	Should HA be enabled for this SLO?	Error rate

See docs/shadow-metrics.md and platform-api/shadow-metrics/README.md.

Version Manifest

Component	Version	Notes
Crossplane	v2.2.0	Namespaced XRs, Pipeline mode only
ArgoCD	v3.3.6	Server-side apply, fine-grained RBAC
Kyverno	chart 3.7.1	CEL policies v1-promoted
External Secrets Operator	chart 2.2.0	IRSA/WI/OIDC auth
OpenTelemetry Operator	latest	v1beta1 Collector CRs
kube-prometheus-stack	chart 72.3.0	Prometheus + Grafana
OpenCost	chart 1.46.0	Team-label cost allocation
Upbound Providers	v1.17.0	AWS, GCP, Azure

Testing

make test              # Run all 10 test suites
make test-yaml         # YAML lint (208 files)
make test-shell        # Shellcheck (13 scripts)
make test-kyverno      # Kyverno CLI policy tests (25 assertions)
make test-xrd          # XRD schema validation (150 assertions)
make test-compositions # Composition structure (231 assertions)
make test-golden-path  # Golden path examples (18 assertions)
make test-observability # OTel/Prometheus/Grafana (15 assertions)
make test-eso          # External Secrets (12 assertions)
make test-scale        # 100+ claims validation (5 assertions)
make test-structure    # File tree completeness

What to Try After Cloning

Read the golden path: golden-path/examples/claim-database.yaml — 9 lines
Read the XRD: platform-api/xrds/database-instance.yaml — the API contract
See three clouds: Compare platform-api/compositions/aws/database.yaml vs gcp/ vs azure/
Test a policy violation: kyverno apply policies/kyverno/cluster-policies/region-enforcement.yaml --resource golden-path/examples/claim-database-WILL-FAIL.yaml
See the semantic gap: Read the annotation on policies/kyverno/cluster-policies/size-caps.yaml
Explore Shadow Metrics: platform-api/shadow-metrics/README.md
Bootstrap on a kind cluster: ./bootstrap/install.sh --provider aws --dry-run

Demo

See DEMO.md for the 6-beat on-stage walkthrough (~10 minutes).

License

Apache 2.0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Backend-First IDP

Architecture

What's in This Repo

Quick Start

Repository Structure

Multi-Cloud Support

The Golden Path

Policy Enforcement

Shadow Metrics — Closing the Semantic Gap

Version Manifest

Testing

What to Try After Cloning

Demo

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
bootstrap		bootstrap
cli		cli
docs		docs
gitops		gitops
golden-path		golden-path
observability		observability
platform-api		platform-api
policies/kyverno		policies/kyverno
portal/backstage		portal/backstage
scripts		scripts
secrets/eso		secrets/eso
teams		teams
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.yamllint.yml		.yamllint.yml
CLAUDE.md		CLAUDE.md
DEMO.md		DEMO.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT_STATE.md		PROJECT_STATE.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Backend-First IDP

Architecture

What's in This Repo

Quick Start

Repository Structure

Multi-Cloud Support

The Golden Path

Policy Enforcement

Shadow Metrics — Closing the Semantic Gap

Version Manifest

Testing

What to Try After Cloning

Demo

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages