Skip to content

DNS-based deployments: vanity hostname claims (first-come-first-served) #148

@posix4e

Description

@posix4e

Parked work from closed PR #145. Full design lives in the PR diff / code if we want to pick it up.

Problem

Every workload URL today is welded to an agent UUID (`-.devopsdefender.com`). When an agent STONITHs or relaunches, the URL breaks. No way to declare a stable short URL like `nvidia-smi.devopsdefender.com` that follows the workload around the fleet, and no automatic failover — if the agent serving a user-visible demo dies, the URL orphans until someone redeploys manually.

Shape

  1. Schema: `expose:` gains a mutually-exclusive `claim_hostname` field alongside `hostname_label`.
  2. Wire: `DD_EXTRA_INGRESS` env extends to `@name:port` for claim entries (auto-label `label:port` stays).
  3. Arbitration: CP POSTs CNAMEs without upsert (`cf::try_claim_cname`). First call wins; later callers hit 409. DNS uniqueness is the lock.
  4. Teardown: collector's orphan-GC path releases the CNAME + CF Access app when the owning agent dies (ownership-checked so it doesn't stomp a legitimate takeover).

Phase 2 (not in #145)

Scraper-driven automatic relaunch: when an agent with active claims goes unhealthy, CP picks another eligible agent (capability match: `require_labels: ["gpu"]` etc.), posts the spec, repoints the CNAME.

Why we closed the PR

Pausing while we focus on smaller near-term fixes. Design is solid; re-open when we want DNS to be the deployment contract.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions