Switch Heroku build to container runtime; pin python:3.12-slim-bookworm#3588
Open
Switch Heroku build to container runtime; pin python:3.12-slim-bookworm#3588
Conversation
Switch the Dockerfile FROM python:3.12 to python:3.12-slim-bookworm to shrink the image by ~510 MB and drop 66 high/critical base-image CVEs without touching requirements/*.txt or Phase 2 work. Required slim shims are added in-line: build-essential plus -dev headers for source-built wheels (multidict, etc.), and a setuptools<81 pin so cumulusci's pkg_resources.declare_namespace import keeps working under modern pip. Pin the slim base to -bookworm explicitly: the unpinned python:3.12-slim tag now resolves to debian trixie (gcc 14), whose stricter default warnings break multidict 6.0.4's pre-3.12-CPython C source. Co-authored-by: Cursor <cursoragent@cursor.com>
M1: caveat the recommendation paragraph to mention whole-image CVEs
(not just base-image CVEs) so a stop-after-recommendation reader
doesn't leave with an inflated impression of the slim win.
M3: split 'Concerns to surface for sub-task 0.3' into '0.3 hand-offs'
(items 1, 2) and 'Deferred / out-of-scope follow-ups' (items 3, 4, 5).
The original heading misrepresented its own contents.
M5: add a Dockerfile cross-reference comment explaining why setuptools<81
is pinned in two RUN lines (the second pip-install layer would
otherwise re-resolve setuptools to >=81 via --upgrade pip-tools).
M2 (percentage) and M4 (alphabetize apt packages) skipped per reviewer
('skippable').
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Two consecutive review-app builds timed out 'waiting to start' because
heroku.yml only declared build.docker.web while app.json's formation
declares four process types (web, devworker, worker, worker-short).
Heroku's container build dispatcher couldn't reconcile that.
Also: the Dockerfile's CMD is start-server.sh (dev-mode: runs migrate +
populate_data + 'yarn serve' under config.settings.local), not the
production daphne entrypoint. Without explicit run.web in heroku.yml,
Heroku would have run start-server.sh on the production review app —
wrong command, wrong settings module.
run.* mirrors the existing Procfile commands exactly:
web -> daphne ASGI server (production)
devworker -> honcho dev-worker bundle
worker -> Selenium browser worker (note: chrome path is
buildpack-flavored; tracked as known follow-on; this
dyno stays quantity:0 in app.json review formation)
worker-short -> honcho short-job worker bundle
Co-authored-by: Cursor <cursoragent@cursor.com>
34aeff5 to
c6547e5
Compare
…prod
Without redeclaring ARG BUILD_ENV / PROD_ASSETS / OMNIOUT_TOKEN inside
the python:3.12-slim-bookworm stage, the values declared above the first
FROM are out of scope for RUN instructions. The yarn-prod conditional
\`[ "${BUILD_ENV}" = "production" ]\` evaluated empty-string against
"production", fell through to the else branch (\`mkdir -p dist/prod\`),
and shipped an empty dist/prod. The Django index.html template loader
then 500'd on \`/\` with TemplateDoesNotExist (the SPA bundle is built
into dist/prod/index.html).
Surfaced during the first end-to-end smoke of the container build on
metadeploy-pr-3588.
… blocks CMD: switch from /app/start-server.sh (dev-mode yarn serve under config.settings.local) to the Procfile's web command (daphne --bind 0.0.0.0 --port $PORT metadeploy.asgi:application). This is the same command heroku.yml's run.web declares, so behavior is unchanged on Common Runtime where heroku.yml run wins. It is materially different on Private Spaces, where heroku.yml run is ignored and the in-image CMD is what runs; the previous CMD would have launched the dev server in production. docker-compose.yml has its own command: override invoking start-server.sh, so local dev is unaffected. curl: add to the apt install list. Heroku release-phase log streaming relies on curl in the image; without it, release output silently degrades to app-logs only. Also useful for in-dyno debug. app.json buildpacks block: dead config under stack: container. heroku.yml is the source of truth for container builds; the buildpacks declaration contradicts the stack and is ignored. app.json environments.test block: Heroku CI does not support container builds, and the metadeploy pipeline has zero recorded test-runs. CI moved to GitHub Actions in 2022 (.github/workflows/ test.yml). The block was a silent no-op.
Documents the container-runtime build/release path (Heroku-built preferred, local container:push fallback), the Heroku Private Spaces CMD-vs-heroku.yml-run quirk, and a manual CVE rebuild cadence (monthly + on Critical CVE) to use until automated rebuild plumbing lands. Replaces the buildpacks-shaped portions of running_heroku.md (which still need a separate rewrite). This page is the public-facing landing for operators. It explains why the Dockerfile CMD must stay aligned with the heroku.yml web run command, and why container:release skips release.command (so manual release.sh runs are required after a local container:push round).
GitHub deprecated v3 of actions/upload-artifact, and active workflow runs now hard-fail at job-prep with the message: "This request has been automatically failed because it uses a deprecated version of actions/upload-artifact: v3." Bump three call sites: test.yml (Frontend coverage, Backend coverage) and smoke_test.yml (Robot results on failure). The change is unrelated to the Phase 0 container-runtime work but is needed to get this PR's CI green. Without the bump, Build and Lint pass while Frontend and Backend are blocked at start.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switch the Heroku build to the container runtime, pin the Python base image to
python:3.12-slim-bookworm, and enable per-PR review apps on themetadeploypipeline so subsequent changes get an auto-built review environment.
Changes
heroku.yml— declares the container build for the existingDockerfileand a release step that runs
./.heroku/release.sh(mirrors the previousProcfile'srelease:line). Per-processrun:blocks are declaredexplicitly for
web,devworker,worker, andworker-short.Dockerfile:python:3.12topython:3.12-slim-bookworm.Slim removes the compilers and
-devheaders needed to build the Cextensions a few of our dependencies still source-build (
cryptography,lxml,psycopg2,multidict, …), so two small shims are added in-line:apt install build-essential libxml2-dev libxslt-dev libpq-dev libffi-dev gettext redis-tools curlpip install "setuptools<81"socumulusci'spkg_resources.declare_namespace("cumulusci")import keeps workingunder modern pip.
Pinned to
-bookwormexplicitly: the unpinnedpython:3.12-slimtag nowresolves to debian trixie (gcc 14), whose stricter default warnings break
multidict6.0.4's pre-3.12-CPython C source.ARG BUILD_ENV/PROD_ASSETS/OMNIOUT_TOKENinside thesecond stage so the
yarn prodconditional actually sees them. ARGsdeclared above the first
FROMare out of scope forRUNsteps.CMDfrom/app/start-server.sh(dev-modeyarn serveunderconfig.settings.local) to theProcfile's web command(
daphne --bind 0.0.0.0 --port \$PORT metadeploy.asgi:application).Same behavior on Common Runtime where
heroku.yml runwins; correctbehavior on Private Spaces, where
heroku.yml runis ignored and thein-image
CMDis what runs. `docker-compose.yml` has its own `command:`override, so local dev is unaffected.
app.json:"stack": "container"at the top level; review apps inherit on creation.formation.{web,devworker,worker,worker-short}.sizeflipped from thedead
"free"to"basic"(Heroku removed free dynos on 2022-11-28).environments.review.scripts.postdeployfixed from the non-existent./manage.py populate_dbto the actual command namepopulate_data(
metadeploy/api/management/commands/populate_data.py).buildpacksblock — dead config understack: container,contradicts the stack declaration and is ignored.
environments.testblock — Heroku CI doesn't supportcontainer builds, and CI already moved to GitHub Actions in 2022
(
.github/workflows/test.yml).docs/heroku-container-runtime.md(new) — operator-facing doc coveringthe build/release path (Heroku-built preferred, local container:push
fallback), the Heroku Private Spaces `CMD`-vs-`heroku.yml run` quirk, the
`heroku container:release` does-not-run-`release.command` quirk, and the
manual CVE rebuild cadence (monthly + on Critical CVE) used until automated
rebuild plumbing lands.
autodestroy, 5-day stale).
Why slim-bookworm vs. `python:3.12`
Verification (live, against the auto-built review app)
(when run by Heroku's builder).
drove the SPA end-to-end on a freshly created scratch org: home → EDA
product → Install plan → Log In → Use Custom Domain → entered scratch
instance_url → Continue. Stops at the Salesforce OAuth boundary with
`redirect_uri_mismatch` because the per-app Connected App's callback
allowlist doesn't include `https://metadeploy-pr-3588.herokuapp.com/accounts/salesforce/login/callback\`.
Container-runtime, daphne, ASGI, frontend assets, websockets handshake,
and Django session/CSRF middleware are all validated end-to-end; what's
blocked is the SF OAuth handoff, which is outside Phase 0 scope.
Known follow-ons (not in this PR)
verification on review apps. Each review app gets a different URL,
but the per-app Connected App's allowlist is static. Three resolution
paths: (a) accept Robot stops at Log-In on review apps (HTTP smoke is
the integration boundary); (b) per-PR Connected App via Salesforce
Metadata API in `app.json` postdeploy; (c) fixed wildcard subdomain
(Heroku-side). (a) is the cheapest and what we're doing today.
`--build-arg`. It runs its own `docker build` and ignores any
locally-built and -tagged image. Result: every `heroku container:push`
produces a `BUILD_ENV=development` image with empty `dist/prod/` even
if the local registry has the right image. Workaround: build locally
with `docker buildx build --no-cache --platform linux/amd64
--build-arg BUILD_ENV=production -t registry.heroku.com//web --load .`
then `docker push` directly (NOT `heroku container:push`), then
`heroku container:release web -a `. Permanent fix candidates:
add `build.config.BUILD_ENV: production` to `heroku.yml`; or
restructure the Dockerfile to make asset compilation always-on.
`install` (not `full-install`); no plan exposes a scratch-org install
path because `supported_orgs` is unset on every populated plan. The
Robot suite's `Tasks.Scratch Org` cannot reach the "Create Scratch Org"
button. Either bias `populate_data` to a slug-pair Robot expects, or
document the working slug pairs.
the removed `sfdx force:org:create` call. `cci flow run dev_org` fails
with that. Workaround: `sf org create scratch -f orgs/.json
--target-dev-hub --duration-days 1 -a ` then
`cci org import `. Phase 7a (cumulusci v4.x harmonization)
needs to confirm v4.x doesn't carry this; if it does, file upstream.
Chrome cannot establish the API websocket for live status updates. SPA
navigation still works; anything dependent on the WS channel (preflight
progress, install progress) won't update in real time during a Robot run.
`metadeploy-staging`, where `heroku.yml run` is ignored and the in-image
`CMD` is what dynos execute. The CMD change above makes the `web` process
prod-correct, but `devworker`, `worker`, and `worker-short` still rely on
`heroku.yml run` and would all launch daphne in Private Spaces. The proper
fix is per-process Dockerfiles (`Dockerfile.web`, `Dockerfile.devworker`,
`Dockerfile.worker`, `Dockerfile.worker-short`) pushed via
`heroku container:push --recursive`.
`.heroku/start_metadeploy_worker.sh` symlinks
`/app/.apt/usr/bin/google-chrome`, a path created by the legacy heroku-apt
buildpack. Under the container runtime this path doesn't exist. Doesn't
block review-app verification because `worker` and `worker-short` are
scaled to 0 for review apps.
`bash` symlink in the image, so `heroku ps:exec` shell debugging is
unavailable. Cheap to add later.
base-layer CVEs but the application stack (`sfdx-cli`, deprecated npm
packages, `cumulusci`/setuptools deprecation) is untouched here.
malformed `deploy_target` and 404s; the platform API call works directly.
If you hit this, use the API.
declares `ARG OMNIOUT_TOKEN` and `metadeploy-stg` has it set as a config
var (review apps inherit), but it is not currently forwarded to the Docker
build args via `heroku.yml`'s `build.config`. Fine because
`yarn install --ignore-optional` skips the package that needs the token;
add a `build.config.OMNIOUT_TOKEN: OMNIOUT_TOKEN` block later if the
optional `@omnistudio/omniscript-lwc-compiler` install becomes required.
most recent releases are deleted after 30 days. Only relevant if you ever
need to re-release an old image directly (rebuilds from source are
always fine).
`release.sh` migrations to see new tables. Likely Postgres
connection-level metadata caching across the brief window where daphne
opened connections before the release dyno finished migrating; needs a
proper repro and a release-phase ordering audit.