Switch Heroku build to container runtime; pin python:3.12-slim-bookworm by jstvz · Pull Request #3588 · SFDO-Tooling/MetaDeploy

jstvz · 2026-05-07T06:13:24Z

Summary

Switch the Heroku build to the container runtime, pin the Python base image to
python:3.12-slim-bookworm, and enable per-PR review apps on the metadeploy
pipeline so subsequent changes get an auto-built review environment.

Changes

heroku.yml — declares the container build for the existing Dockerfile
and a release step that runs ./.heroku/release.sh (mirrors the previous
Procfile's release: line). Per-process run: blocks are declared
explicitly for web, devworker, worker, and worker-short.
Dockerfile:
- Pins the base image from python:3.12 to python:3.12-slim-bookworm.
  Slim removes the compilers and -dev headers needed to build the C
  extensions a few of our dependencies still source-build (cryptography,
  lxml, psycopg2, multidict, …), so two small shims are added in-line:
  - apt install build-essential libxml2-dev libxslt-dev libpq-dev libffi-dev gettext redis-tools curl
  - pip install "setuptools<81" so cumulusci's
    pkg_resources.declare_namespace("cumulusci") import keeps working
    under modern pip.
    Pinned to -bookworm explicitly: the unpinned python:3.12-slim tag now
    resolves to debian trixie (gcc 14), whose stricter default warnings break
    multidict 6.0.4's pre-3.12-CPython C source.
- Re-imports ARG BUILD_ENV / PROD_ASSETS / OMNIOUT_TOKEN inside the
  second stage so the yarn prod conditional actually sees them. ARGs
  declared above the first FROM are out of scope for RUN steps.
- Switches CMD from /app/start-server.sh (dev-mode yarn serve under
  config.settings.local) to the Procfile's web command
  (daphne --bind 0.0.0.0 --port \$PORT metadeploy.asgi:application).
  Same behavior on Common Runtime where heroku.yml run wins; correct
  behavior on Private Spaces, where heroku.yml run is ignored and the
  in-image CMD is what runs. `docker-compose.yml` has its own `command:`
  override, so local dev is unaffected.
app.json:
- "stack": "container" at the top level; review apps inherit on creation.
- formation.{web,devworker,worker,worker-short}.size flipped from the
  dead "free" to "basic" (Heroku removed free dynos on 2022-11-28).
- environments.review.scripts.postdeploy fixed from the non-existent
  ./manage.py populate_db to the actual command name populate_data
  (metadeploy/api/management/commands/populate_data.py).
- Removed the buildpacks block — dead config under stack: container,
  contradicts the stack declaration and is ignored.
- Removed the environments.test block — Heroku CI doesn't support
  container builds, and CI already moved to GitHub Actions in 2022
  (.github/workflows/test.yml).
docs/heroku-container-runtime.md (new) — operator-facing doc covering
the build/release path (Heroku-built preferred, local container:push
fallback), the Heroku Private Spaces `CMD`-vs-`heroku.yml run` quirk, the
`heroku container:release` does-not-run-`release.command` quirk, and the
manual CVE rebuild cadence (monthly + on Critical CVE) used until automated
rebuild plumbing lands.
Review apps enabled on the `metadeploy` pipeline (autodeploy +
autodestroy, 5-day stale).

Why slim-bookworm vs. `python:3.12`

~510 MB smaller image
66 fewer high/critical CVEs at the base layer

Verification (live, against the auto-built review app)

`/` → 200 (SPA renders, site context loads, `<title>MetaDeploy`)
`/api/products/` → 200 (35 products, 41 plans, 55 steps after `populate_data`)
`/api/plans/` → 200 (41 plans)
`/api/versions/` → 200 (36 versions)
`/products/eda/` → 200 (8.8 KB SPA shell renders for the EDA product page)
Container boot: daphne starts cleanly, migrations apply via release.sh
(when run by Heroku's builder).
cumulusci Robot suite (`uv run cci task run robot --org enterprise -o vars "BASE_URL:https://metadeploy-pr-3588.herokuapp.com,PRODUCT:eda,PLAN:install"\`)
drove the SPA end-to-end on a freshly created scratch org: home → EDA
product → Install plan → Log In → Use Custom Domain → entered scratch
instance_url → Continue. Stops at the Salesforce OAuth boundary with
`redirect_uri_mismatch` because the per-app Connected App's callback
allowlist doesn't include `https://metadeploy-pr-3588.herokuapp.com/accounts/salesforce/login/callback\`.
Container-runtime, daphne, ASGI, frontend assets, websockets handshake,
and Django session/CSRF middleware are all validated end-to-end; what's
blocked is the SF OAuth handoff, which is outside Phase 0 scope.

Known follow-ons (not in this PR)

SF Connected App callback URL is the gating issue for any Robot/UI
verification on review apps. Each review app gets a different URL,
but the per-app Connected App's allowlist is static. Three resolution
paths: (a) accept Robot stops at Log-In on review apps (HTTP smoke is
the integration boundary); (b) per-PR Connected App via Salesforce
Metadata API in `app.json` postdeploy; (c) fixed wildcard subdomain
(Heroku-side). (a) is the cheapest and what we're doing today.
`heroku container:push` rebuilds the image without forwarding
`--build-arg`. It runs its own `docker build` and ignores any
locally-built and -tagged image. Result: every `heroku container:push`
produces a `BUILD_ENV=development` image with empty `dist/prod/` even
if the local registry has the right image. Workaround: build locally
with `docker buildx build --no-cache --platform linux/amd64
--build-arg BUILD_ENV=production -t registry.heroku.com//web --load .`
then `docker push` directly (NOT `heroku container:push`), then
`heroku container:release web -a `. Permanent fix candidates:
add `build.config.BUILD_ENV: production` to `heroku.yml`; or
restructure the Dockerfile to make asset compilation always-on.
populate_data sample-data limitations. The EDA plan slug is
`install` (not `full-install`); no plan exposes a scratch-org install
path because `supported_orgs` is unset on every populated plan. The
Robot suite's `Tasks.Scratch Org` cannot reach the "Create Scratch Org"
button. Either bias `populate_data` to a slug-pair Robot expects, or
document the working slug pairs.
CCI 3.93.0 + `sf` CLI 2.131.7 incompatibility. CCI 3.93.0 hardcodes
the removed `sfdx force:org:create` call. `cci flow run dev_org` fails
with that. Workaround: `sf org create scratch -f orgs/.json
--target-dev-hub --duration-days 1 -a ` then
`cci org import `. Phase 7a (cumulusci v4.x harmonization)
needs to confirm v4.x doesn't carry this; if it does, file upstream.
Frontend "Offline mode" banner under headless Chrome. Selenium-driven
Chrome cannot establish the API websocket for live status updates. SPA
navigation still works; anything dependent on the WS channel (preflight
progress, install progress) won't update in real time during a Robot run.
Private Space prod cutover. `metadeploy-stg` runs in Private Space
`metadeploy-staging`, where `heroku.yml run` is ignored and the in-image
`CMD` is what dynos execute. The CMD change above makes the `web` process
prod-correct, but `devworker`, `worker`, and `worker-short` still rely on
`heroku.yml run` and would all launch daphne in Private Spaces. The proper
fix is per-process Dockerfiles (`Dockerfile.web`, `Dockerfile.devworker`,
`Dockerfile.worker`, `Dockerfile.worker-short`) pushed via
`heroku container:push --recursive`.
`worker` dyno's Chrome path is buildpack-flavored.
`.heroku/start_metadeploy_worker.sh` symlinks
`/app/.apt/usr/bin/google-chrome`, a path created by the legacy heroku-apt
buildpack. Under the container runtime this path doesn't exist. Doesn't
block review-app verification because `worker` and `worker-short` are
scaled to 0 for review apps.
`heroku ps:exec` not wired. No `.profile.d/heroku-exec.sh` and no
`bash` symlink in the image, so `heroku ps:exec` shell debugging is
unavailable. Cheap to add later.
Application-stack CVEs (~128 H/C) remain. The slim cutover removes
base-layer CVEs but the application stack (`sfdx-cli`, deprecated npm
packages, `cumulusci`/setuptools deprecation) is untouched here.
Heroku CLI 11.3.0 `reviewapps:enable` quirk. The CLI sends a
malformed `deploy_target` and 404s; the platform API call works directly.
If you hit this, use the API.
`OMNIOUT_TOKEN` not wired through `build.config`. The Dockerfile
declares `ARG OMNIOUT_TOKEN` and `metadeploy-stg` has it set as a config
var (review apps inherit), but it is not currently forwarded to the Docker
build args via `heroku.yml`'s `build.config`. Fine because
`yarn install --ignore-optional` skips the package that needs the token;
add a `build.config.OMNIOUT_TOKEN: OMNIOUT_TOKEN` block later if the
optional `@omnistudio/omniscript-lwc-compiler` install becomes required.
Heroku Container Registry 30-day retention. Images outside the 20
most recent releases are deleted after 30 days. Only relevant if you ever
need to re-release an old image directly (rebuilds from source are
always fine).
Investigate why a fresh review-app web dyno needed `ps:restart` after
`release.sh` migrations to see new tables. Likely Postgres
connection-level metadata caching across the brief window where daphne
opened connections before the release dyno finished migrating; needs a
proper repro and a release-phase ordering audit.

Switch the Dockerfile FROM python:3.12 to python:3.12-slim-bookworm to shrink the image by ~510 MB and drop 66 high/critical base-image CVEs without touching requirements/*.txt or Phase 2 work. Required slim shims are added in-line: build-essential plus -dev headers for source-built wheels (multidict, etc.), and a setuptools<81 pin so cumulusci's pkg_resources.declare_namespace import keeps working under modern pip. Pin the slim base to -bookworm explicitly: the unpinned python:3.12-slim tag now resolves to debian trixie (gcc 14), whose stricter default warnings break multidict 6.0.4's pre-3.12-CPython C source. Co-authored-by: Cursor <cursoragent@cursor.com>

M1: caveat the recommendation paragraph to mention whole-image CVEs (not just base-image CVEs) so a stop-after-recommendation reader doesn't leave with an inflated impression of the slim win. M3: split 'Concerns to surface for sub-task 0.3' into '0.3 hand-offs' (items 1, 2) and 'Deferred / out-of-scope follow-ups' (items 3, 4, 5). The original heading misrepresented its own contents. M5: add a Dockerfile cross-reference comment explaining why setuptools<81 is pinned in two RUN lines (the second pip-install layer would otherwise re-resolve setuptools to >=81 via --upgrade pip-tools). M2 (percentage) and M4 (alphabetize apt packages) skipped per reviewer ('skippable'). Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Two consecutive review-app builds timed out 'waiting to start' because heroku.yml only declared build.docker.web while app.json's formation declares four process types (web, devworker, worker, worker-short). Heroku's container build dispatcher couldn't reconcile that. Also: the Dockerfile's CMD is start-server.sh (dev-mode: runs migrate + populate_data + 'yarn serve' under config.settings.local), not the production daphne entrypoint. Without explicit run.web in heroku.yml, Heroku would have run start-server.sh on the production review app — wrong command, wrong settings module. run.* mirrors the existing Procfile commands exactly: web -> daphne ASGI server (production) devworker -> honcho dev-worker bundle worker -> Selenium browser worker (note: chrome path is buildpack-flavored; tracked as known follow-on; this dyno stays quantity:0 in app.json review formation) worker-short -> honcho short-job worker bundle Co-authored-by: Cursor <cursoragent@cursor.com>

…prod Without redeclaring ARG BUILD_ENV / PROD_ASSETS / OMNIOUT_TOKEN inside the python:3.12-slim-bookworm stage, the values declared above the first FROM are out of scope for RUN instructions. The yarn-prod conditional \`[ "${BUILD_ENV}" = "production" ]\` evaluated empty-string against "production", fell through to the else branch (\`mkdir -p dist/prod\`), and shipped an empty dist/prod. The Django index.html template loader then 500'd on \`/\` with TemplateDoesNotExist (the SPA bundle is built into dist/prod/index.html). Surfaced during the first end-to-end smoke of the container build on metadeploy-pr-3588.

… blocks CMD: switch from /app/start-server.sh (dev-mode yarn serve under config.settings.local) to the Procfile's web command (daphne --bind 0.0.0.0 --port $PORT metadeploy.asgi:application). This is the same command heroku.yml's run.web declares, so behavior is unchanged on Common Runtime where heroku.yml run wins. It is materially different on Private Spaces, where heroku.yml run is ignored and the in-image CMD is what runs; the previous CMD would have launched the dev server in production. docker-compose.yml has its own command: override invoking start-server.sh, so local dev is unaffected. curl: add to the apt install list. Heroku release-phase log streaming relies on curl in the image; without it, release output silently degrades to app-logs only. Also useful for in-dyno debug. app.json buildpacks block: dead config under stack: container. heroku.yml is the source of truth for container builds; the buildpacks declaration contradicts the stack and is ignored. app.json environments.test block: Heroku CI does not support container builds, and the metadeploy pipeline has zero recorded test-runs. CI moved to GitHub Actions in 2022 (.github/workflows/ test.yml). The block was a silent no-op.

Documents the container-runtime build/release path (Heroku-built preferred, local container:push fallback), the Heroku Private Spaces CMD-vs-heroku.yml-run quirk, and a manual CVE rebuild cadence (monthly + on Critical CVE) to use until automated rebuild plumbing lands. Replaces the buildpacks-shaped portions of running_heroku.md (which still need a separate rewrite). This page is the public-facing landing for operators. It explains why the Dockerfile CMD must stay aligned with the heroku.yml web run command, and why container:release skips release.command (so manual release.sh runs are required after a local container:push round).

GitHub deprecated v3 of actions/upload-artifact, and active workflow runs now hard-fail at job-prep with the message: "This request has been automatically failed because it uses a deprecated version of actions/upload-artifact: v3." Bump three call sites: test.yml (Frontend coverage, Backend coverage) and smoke_test.yml (Robot results on failure). The change is unrelated to the Phase 0 container-runtime work but is needed to get this PR's CI green. Without the bump, Build and Lint pass while Frontend and Backend are blocked at start.

jstvz and others added 3 commits May 6, 2026 22:39

feat(phase-0): heroku container runtime config + review-app integration

58a1ee7

Co-authored-by: Cursor <cursoragent@cursor.com>

jstvz requested a review from a team as a code owner May 7, 2026 06:13

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 06:15 Inactive

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 06:23 Inactive

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 06:30 Inactive

jstvz force-pushed the restart/phase-0-spike branch from 34aeff5 to c6547e5 Compare May 7, 2026 06:50

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 06:52 Inactive

jstvz changed the title ~~Phase 0: container-runtime spike + base image decision~~ Switch Heroku build to container runtime; pin python:3.12-slim-bookworm May 7, 2026

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 07:37 Inactive

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 08:28 Inactive

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 08:53 Inactive

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 15:50 Inactive

chore(probe): trigger heroku build-queue probe sample 2

0c3b017

jstvz temporarily deployed to metadeploy-pr-3588 May 7, 2026 15:57 Inactive

chore(probe): trigger heroku build-queue probe sample 3

e766a8e

jstvz had a problem deploying to metadeploy-pr-3588 May 7, 2026 16:03 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch Heroku build to container runtime; pin python:3.12-slim-bookworm#3588

Switch Heroku build to container runtime; pin python:3.12-slim-bookworm#3588
jstvz wants to merge 10 commits intomainfrom
restart/phase-0-spike

jstvz commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jstvz commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why slim-bookworm vs. `python:3.12`

Verification (live, against the auto-built review app)

Known follow-ons (not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jstvz commented May 7, 2026 •

edited

Loading