Skip to content

docs: refresh infra docs for post-Hetzner architecture#57

Open
abtreece wants to merge 1 commit into
fullstaq-ruby:mainfrom
abtreece:docs/post-hetzner-refresh
Open

docs: refresh infra docs for post-Hetzner architecture#57
abtreece wants to merge 1 commit into
fullstaq-ruby:mainfrom
abtreece:docs/post-hetzner-refresh

Conversation

@abtreece
Copy link
Copy Markdown
Collaborator

@abtreece abtreece commented Apr 29, 2026

Closes #55.

Summary

Refreshes docs/ to describe the current infrastructure (single Hetzner VM running Caddy + Sinatra/Puma API server + Prometheus, provisioned by Ansible) instead of the pre-July-2024 architecture (GKE Autopilot + Nginx Ingress + Cloud Run apiserver). The infrastructure overview diagram is also refreshed: the stale infrastructure-overview.drawio.svg is replaced by a Mermaid block embedded in infrastructure-overview.md so future diagram changes are reviewable as text diffs.

Files changed

  • docs/infrastructure-overview.md — rewritten section by section against current IaC.

    • The two pre-existing GCP-service-account sections are folded into a single CI/CD authentication section, split per-caller:
      • fullstaq-ruby/server-edition → GCP via Workload Identity Federation (APT/YUM repo buckets + GCS CI artifacts bucket).
      • fullstaq-ruby/server-edition → Azure via Federated Identity Credentials (Azure Blob CI artifacts + CI cache containers + Key Vault GPG key).
      • fullstaq-ruby/infra → API server only, via a GitHub-issued OIDC JWT (audience backend.fullstaqruby.org) sent to POST /admin/upgrade_apiserver. The infra workflow does not authenticate to GCP or Azure APIs.
    • Caddy section: there is no backend.fullstaqruby.org vhost; both apt. and yum. vhosts handle /admin/* via reverse_proxy to the apiserver Unix socket (per ansible/files/Caddyfile). CI calls /admin/* via https://apt.fullstaqruby.org.
    • Google Cloud project section: corrected to a single project (fsruby-server-edition2, display name "Fullstaq Ruby Server Edition"), provisioned by terraform-hisec/gcloud_project.tf and populated by terraform/. The hisec/non-hisec boundary lives at the Terraform-state and access-group layer, not at a GCP project boundary.
    • API server section: Sinatra/Puma on a Unix socket under systemd; sibling apiserver-deployer.service performs self-update from a tarball attached to a GitHub Release.
    • VM (Hetzner) section: Terraform-managed forward DNS (backend.fullstaqruby.org, apt.fullstaqruby.org, yum.fullstaqruby.org) is distinguished from the manually-set Hetzner PTR record.
    • CI artifacts / cache sections: artifacts are dual-cloud (public GCS + private Azure container); cache is Azure-only.
    • Container registry section: dropped (no registry resources are managed in this repo).
    • GPG private key section: Key Vault name uses the templated form ${var.key_vault_prefix}infraowners (currently fsruby2infraowners).
  • docs/infrastructure-overview.drawio.svgdeleted. Replaced by the Mermaid block in infrastructure-overview.md.

  • docs/editing-diagrams.mddeleted. Mermaid is edited inline; no diagrams.net round-trip is needed.

  • docs/deploy.md — replaces the gcloud container clusters get-credentials + kubectl apply -k ../kubernetes steps with a single ansible-playbook step matching Step 11 of the bootstrapping guide. Adds a callout that apiserver code changes deploy via the GitHub Actions workflow.

  • docs/infrastructure-as-code.md — drops Kustomize and the kubernetes/ directory bullet; adds Ansible to the tools list and an ansible/ directory bullet.

  • docs/infrastructure-bootstrapping.md — intro updated to mention Terraform + Ansible (not Kubernetes/Kustomize). The rest of the file already reflected the post-migration setup.

  • docs/pull_request_template.md — diagram-update checkbox now points to the Mermaid block.

  • README.md — drops the link to the deleted editing-diagrams.md.

  • .editorconfig — removes the duplicate [config.ru] block (the tab/4 one); only the correct space/2 rule remains.

Verification

  • eclint check $(git ls-files) passes.
  • grep -rin 'kubernetes\|kustomize\|kubectl\|gke\|nginx\|cloud run' docs/ returns only intentional historical mentions (e.g. "the previous GKE Autopilot setup was replaced by this VM in the July 2024 rearchitecture").
  • Each rewritten claim is traceable to current IaC: terraform/{dns,gcloud_auth,backend,repo_buckets,ci_storage}.tf, terraform-hisec/{gcloud_project,key_vault,backend}.tf, ansible/main.yml, ansible/files/{Caddyfile,apiserver.service}, .github/workflows/apiserver.yml.

Note: PAT-based CI bot

The "Github CI bot account" section describes a PAT-based bot. Retiring/converting that account is already tracked in #18 ("Change fullstaq-ruby-ci-bot account into a Github app") and is therefore intentionally not in scope here. The text remains as-is so the doc reflects the current state until #18 lands.

@abtreece abtreece force-pushed the docs/post-hetzner-refresh branch from 66e915d to cfbe8a1 Compare April 29, 2026 02:56
@abtreece abtreece requested a review from FooBarWidget May 15, 2026 03:30
Closes fullstaq-ruby#55.

Brings docs/ in line with the post-July-2024 architecture (single
Hetzner VM running Caddy + Sinatra/Puma API server + Prometheus,
provisioned by Ansible), replacing references to the previous
GKE Autopilot + Nginx Ingress + Cloud Run apiserver setup.

Files changed:

- docs/infrastructure-overview.md — rewritten section by section.
  Every claim is grounded in current IaC. The two GCP-service-account
  sections are folded into a single "CI/CD authentication" section
  that splits per-caller: server-edition uses GCP WIF (APT/YUM repo
  buckets + GCS CI artifacts bucket) and Azure Federated Identity
  Credentials (Azure Blob CI artifacts + CI cache + Key Vault GPG
  key); infra repo's apiserver workflow only mints a GitHub OIDC JWT
  (audience backend.fullstaqruby.org) and POSTs to
  /admin/upgrade_apiserver — it does not authenticate to GCP or
  Azure APIs. The Caddy section is corrected: there is no
  backend.fullstaqruby.org vhost; both apt. and yum. vhosts handle
  /admin/* via reverse_proxy to the apiserver Unix socket. The
  "Google Cloud projects" claim of two projects is corrected — there
  is one project, fsruby-server-edition2, provisioned by
  terraform-hisec/gcloud_project.tf and populated by terraform/; the
  hisec/non-hisec separation lives at the Terraform-state and
  access-group layer. Container registry section dropped (no
  registry resources are managed in this repo). Key Vault name
  uses the templated form ${var.key_vault_prefix}infraowners
  (currently fsruby2infraowners). CI artifacts/cache split is now
  explicit (artifacts dual-cloud, cache Azure-only). VM section
  distinguishes Terraform-managed forward DNS from the
  manually-set Hetzner PTR record.

- docs/infrastructure-overview.drawio.svg — deleted. Replaced by an
  inline Mermaid diagram in infrastructure-overview.md so future
  diagram changes are reviewable as text diffs.

- docs/editing-diagrams.md — deleted (no longer needed without the
  drawio round-trip).

- docs/deploy.md — replaces the gcloud-clusters/kubectl steps with
  a single ansible-playbook step matching bootstrapping Step 11.
  Adds a callout that apiserver code changes deploy via the
  GitHub Actions workflow.

- docs/infrastructure-as-code.md — drops Kustomize and the
  kubernetes/ directory bullet; adds Ansible to the tool list and
  an ansible/ directory bullet.

- docs/infrastructure-bootstrapping.md — intro updated to mention
  Terraform + Ansible (not Kubernetes/Kustomize); the rest of the
  file already reflected the post-migration setup.

- docs/pull_request_template.md — diagram-update checkbox now
  points to the Mermaid block instead of the deleted drawio file.

- README.md — drops the link to the deleted editing-diagrams.md.

- .editorconfig — removes the duplicate [config.ru] block (the
  tab/4 one); only the correct space/2 rule remains.

Note: the "Github CI bot account" section is kept as-is. Retiring
that PAT-based bot is already tracked in fullstaq-ruby#18 and is therefore out
of scope here.
@abtreece abtreece force-pushed the docs/post-hetzner-refresh branch from cfbe8a1 to 85b83bf Compare May 15, 2026 03:48
@FooBarWidget
Copy link
Copy Markdown
Member

I'll have a good look. So far my first impression is that the new diagram lacks a lot of detail that was in the older diagram. I'm also not sure whether a detailed but automatically rendered diagram is still readable compared to a manually drawn one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update documentation to reflect current architecture

2 participants