Skip to content

containers: Keep sidecar warm across destroy/start#6670

Open
tewaro wants to merge 1 commit intomainfrom
tewaro/container-test-flakiness
Open

containers: Keep sidecar warm across destroy/start#6670
tewaro wants to merge 1 commit intomainfrom
tewaro/container-test-flakiness

Conversation

@tewaro
Copy link
Copy Markdown
Contributor

@tewaro tewaro commented Apr 25, 2026

container-client tests flake on CI because each container.start() recreates the proxy-everything sidecar, and that path dominates start() cost under dockerd contention (bridge IPAM lock + iptables in the new netns). CI's ~1400 concurrent tests on one dockerd socket regularly push start() past the 10s readiness-probe budget.
The sidecar is workerd-internal; container.capnp makes no promises about its lifetime. Reusing it across destroy/start within one workerd process is safe and eliminates the dominant per-start cost. Scope: localDocker container engine only — local-dev/test, gated behind --experimental. No production code path affected.

Measured impact:
bazel --jobs=16 --local_test_jobs=16 --runs_per_test=10:
Before: @ 10/10 fail · @all-autogates 9/10 timeout · @all-compat-flags 7/10 timeout
After: @ 10/10 PASS · @all-autogates 10/10 PASS · @all-compat-flags 10/10 PASS
Sidecar phase under contention (instrumented, then reverted):
Cold: avg 1.48s, max 6.90s
Warm: avg 32us, max 291us
32% of start() calls take the warm path. The new PUT /egress on every
start() is ~1ms.

Changes:

  • start() no longer resets containerSidecarStarted; the flag's invariant ("true iff a sidecar is running") is correctly maintained by ensureSidecarStarted (sets on cold-path success, clears via KJ_ON_SCOPE_FAILURE) and status() (re-derives from Docker at DO startup). The previous per-start reset was redundant — status() already handles workerd-restart orphan recovery.
  • destroy() no longer tears down the sidecar.
  • start() always pushes the egress config (PUT /egress is synchronous; proxy-everything updates an atomic.Pointer before responding), so internetEnabled and DNS allow-list changes still take effect. User-visible Container API (status/start/destroy/monitor/getTcpPort/ exec/setEgress*/snapshot) is unchanged.

Risks:

  • docker ps shows *-proxy sidecars between destroy() and the next start() (or until DO eviction). ~15MB per warm sidecar.
  • Orphan recovery unchanged: status() reattaches if app+sidecar both running; otherwise next start() cold path clears the orphan.
  • TLS-intercept CA cert now stable across destroy/start within one workerd process instead of regenerating per start. No documented contract about CA rotation.

Examples

  1. https://github.com/cloudflare/workerd/actions/runs/24923537111/job/72990269801

container-client tests flake on CI because each container.start()
recreates the proxy-everything sidecar, and that path dominates start()
cost under dockerd contention (bridge IPAM lock + iptables in the new
netns). CI's ~1400 concurrent tests on one dockerd socket regularly
push start() past the 10s readiness-probe budget.
The sidecar is workerd-internal; container.capnp makes no promises
about its lifetime. Reusing it across destroy/start within one workerd
process is safe and eliminates the dominant per-start cost.
Scope: localDocker container engine only — local-dev/test, gated behind
--experimental. No production code path affected.

Measured impact:
bazel --jobs=16 --local_test_jobs=16 --runs_per_test=10:
  Before:  @ 10/10 fail · @all-autogates 9/10 timeout · @all-compat-flags 7/10 timeout
  After:   @ 10/10 PASS · @all-autogates 10/10 PASS  · @all-compat-flags 10/10 PASS
Sidecar phase under contention (instrumented, then reverted):
  Cold:  avg 1.48s, max 6.90s
  Warm:  avg 32us,  max 291us
32% of start() calls take the warm path. The new PUT /egress on every
start() is ~1ms.

Changes:
- start() no longer resets containerSidecarStarted; the flag's invariant
  ("true iff a sidecar is running") is correctly maintained by
  ensureSidecarStarted (sets on cold-path success, clears via
  KJ_ON_SCOPE_FAILURE) and status() (re-derives from Docker at DO
  startup). The previous per-start reset was redundant — status()
  already handles workerd-restart orphan recovery.
- destroy() no longer tears down the sidecar.
- start() always pushes the egress config (PUT /egress is synchronous;
  proxy-everything updates an atomic.Pointer before responding), so
  internetEnabled and DNS allow-list changes still take effect.
User-visible Container API (status/start/destroy/monitor/getTcpPort/
exec/setEgress*/snapshot) is unchanged.

Risks:
- `docker ps` shows *-proxy sidecars between destroy() and the next
  start() (or until DO eviction). ~15MB per warm sidecar.
- Orphan recovery unchanged: status() reattaches if app+sidecar both
  running; otherwise next start() cold path clears the orphan.
- TLS-intercept CA cert now stable across destroy/start within one
  workerd process instead of regenerating per start. No documented
  contract about CA rotation.
@tewaro tewaro requested review from a team as code owners April 25, 2026 10:12
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 25, 2026

Now I have a thorough understanding. Let me think about the potential issue I identified: if the sidecar dies between destroy() and start(), the warm path in ensureSidecarStarted() returns immediately (because containerSidecarStarted is true), then updateSidecarEgressConfig() at line 2258 fails because the sidecar is dead. The error propagates up and start() fails. But containerSidecarStarted is still true, so the next start() attempt would also take the warm path and fail the same way. There's no recovery path that resets containerSidecarStarted when the sidecar is discovered to be dead mid-lifecycle.

However, this is a pre-existing concern: the same issue existed before this PR if the sidecar died between ensureSidecarStarted() at line 2248 and updateSidecarEgressConfig() in the cold path at line 2647. It's just that the window is now wider since the sidecar may have been created in a previous start() call.

The PR description acknowledges this risk and notes it's limited to localDocker (experimental, --experimental flag, local dev/test only). The overall change is well-reasoned and well-documented.

Let me check one more thing: the removal of this->sidecarIngressHostPort = kj::none; from destroy(). Previously this was at line 2303 (now removed). This is important because sidecarIngressHostPort is used by DockerPort to route traffic. After destroy(), if user code somehow still holds a port reference and tries to fetch through it, it would try to reach the (still-running) sidecar. But that should be fine — the app container is gone so the sidecar would just fail to proxy. And the containerStarted flag is set to false by monitor() completing, which gates operations like exec().

Overall this looks like a clean, well-scoped optimization. The comments are thorough and explain the rationale well. The code change is minimal and correct. The only minor concern I see is worth mentioning but not blocking.

Let me check if the KJ_IF_SOME at line 2257 could silently skip the egress config update if the cold path in ensureSidecarStarted() fails to set sidecarIngressHostPort — but looking at line 2629-2636, if the cold path fails, KJ_ON_SCOPE_FAILURE resets containerSidecarStarted to false, and the exception propagates before reaching line 2257. So that's fine.

One thing I want to verify: on the cold path (first start()), ensureSidecarStarted() already calls updateSidecarEgressConfig() at line 2647. Then line 2257-2259 calls it again. The PR comment says "this second push is a single fast PUT and is idempotent" — this is correct, the double-push on cold path is harmless.

This is a well-considered change. Let me post the review.

LGTM

github run

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 25, 2026

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant