Enable FUSE in workerd's local-dev Docker container client#6596
Enable FUSE in workerd's local-dev Docker container client#6596Ben2W wants to merge 1 commit intocloudflare:mainfrom
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
38469d2 to
b7e40b2
Compare
| struct DeviceMapping { | ||
| pathOnHost @0 :Text; | ||
| pathInContainer @1 :Text; | ||
| cgroupPermissions @2 :Text; | ||
| pathOnHost @0 :Text $Json.name("PathOnHost"); | ||
| pathInContainer @1 :Text $Json.name("PathInContainer"); | ||
| cgroupPermissions @2 :Text $Json.name("CgroupPermissions"); | ||
| } |
There was a problem hiding this comment.
Docker's API uses PascalCase keys for these fields (PathOnHost / PathInContainer / CgroupPermissions). Without these $Json.name annotations, capnp's JSON codec emits the source field names in camelCase, which Docker silently ignores
b7e40b2 to
5b7d5cc
Compare
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
The reproduction's Docker proxy previously injected Privileged=true alongside CapAdd/Devices/SecurityOpt. Privileged was redundant in practice (it masked whether the other three were doing anything), and made it look like the proxy and the workerd PR were doing different things. Narrowed the proxy to only the three HostConfig fields that the upstream PR (cloudflare/workerd#6596) enables: CapAdd=[SYS_ADMIN], Devices=[/dev/fuse], SecurityOpt=[apparmor:unconfined]. All 3 vitest scenarios still pass. The proxy and the patched-workerd now both make FUSE work with the exact same minimum capability set.
|
/bonk review I think this is likely fine so long as it doesn't require running with |
There was a problem hiding this comment.
Thanks for the contribution! The docker-api.capnp fix is clearly correct and needed. The FUSE enablement logic works but I have a few concerns about the scope and unconditional nature of the changes.
Summary:
- The
$Json.nameannotation fix onDeviceMappingis a genuine bug fix — without it the fields serialize as camelCase and Docker silently ignores them. Good catch. - Unconditionally adding
SYS_ADMIN+/dev/fuse+apparmor:unconfinedto every container is a broad change. Most containers don't need FUSE, and this widens the attack surface for all of them. Even in local dev, this adds capabilities that could mask bugs or create surprising behavior for users who aren't using FUSE. - The
apparmor:unconfinedsetting is Linux-specific. On macOS (Docker Desktop with a Linux VM), AppArmor typically isn't active, so this is harmless but unnecessary. Not a blocker, but worth noting.
See inline comments for specifics.
| // Docker doesn't grant FUSE access by default — enable the minimum permissions for it. | ||
| { | ||
| auto capAdd = hostConfig.initCapAdd(1); | ||
| capAdd.set(0, "SYS_ADMIN"); | ||
| } | ||
| { | ||
| auto devices = hostConfig.initDevices(1); | ||
| auto fuseDev = devices[0]; | ||
| fuseDev.setPathOnHost("/dev/fuse"); | ||
| fuseDev.setPathInContainer("/dev/fuse"); | ||
| fuseDev.setCgroupPermissions("rwm"); | ||
| } | ||
| { | ||
| auto securityOpt = hostConfig.initSecurityOpt(1); | ||
| securityOpt.set(0, "apparmor:unconfined"); | ||
| } |
There was a problem hiding this comment.
This unconditionally grants SYS_ADMIN, maps /dev/fuse, and disables AppArmor for every container created via local dev, regardless of whether the user's workload uses FUSE. CAP_SYS_ADMIN is one of the most powerful Linux capabilities — it covers mount(), setns(), unshare(), bpf(), and more. Granting it by default to all local containers when only a small subset need FUSE seems overly broad.
A few options to consider:
- Opt-in flag: Gate this behind a configuration option (the approach mentioned in 🐛 Bug Report — Runtime APIs: No privileged mode option for Containers local development #5609). Users who need FUSE explicitly enable it.
- Check for
/dev/fuseon the host: Only add the device mapping and capabilities if/dev/fuseactually exists on the host. This is a lighter heuristic. - At minimum, add a comment explaining the security trade-off so future readers understand why every local container gets
SYS_ADMIN.
I'd defer to the maintainers on which approach they prefer, but option 1 seems safest for a first pass — it avoids surprising users who don't need FUSE with elevated container privileges.
| } | ||
| { | ||
| auto securityOpt = hostConfig.initSecurityOpt(1); | ||
| securityOpt.set(0, "apparmor:unconfined"); |
There was a problem hiding this comment.
Nit: apparmor:unconfined is a no-op on macOS (Docker Desktop runs a Linux VM that may or may not have AppArmor). Not a problem — Docker just ignores unknown security options — but worth a brief comment noting it's Linux-specific, similar to how gatewayForPlatform (line 873) documents its macOS behavior.
| // Docker doesn't grant FUSE access by default — enable the minimum permissions for it. | ||
| { | ||
| auto capAdd = hostConfig.initCapAdd(1); | ||
| capAdd.set(0, "SYS_ADMIN"); |
There was a problem hiding this comment.
Since the sidecar container (line 1976) already uses initCapAdd(1) for NET_ADMIN, this follows the existing pattern well. However, if FUSE support is eventually gated behind an opt-in, this block should be conditional. Even if it stays unconditional, consider adding a brief rationale comment explaining why SYS_ADMIN specifically (i.e., FUSE needs the mount() syscall) so future readers don't mistake it for an oversight.
|
Review posted successfully. Here's a summary of the feedback I left: docker-api.capnp fix (lines 35-37): The FUSE enablement in container-client.c++ (lines 1701-1716): The main concern is that
I also noted that |
2d5d9d8 to
67aeb3b
Compare
In ContainerClient::createContainer(), grant the minimum capabilities
needed so a FUSE mount inside the user's container succeeds:
- CapAdd: ["SYS_ADMIN"] (for the mount() syscall)
- Devices: [/dev/fuse] (so the kernel has something to open)
- SecurityOpt: ["apparmor:unconfined"] (bypasses the default docker
apparmor profile's mount block)
No Privileged=true. Mirrors what Cloudflare's production container
runtime effectively provides for FUSE workers, without granting all
Linux capabilities or full /dev passthrough. Only affects workerd's
local Docker code path; production is unaffected.
Also adds missing $Json.name annotations to docker_api::DeviceMapping
fields in docker-api.capnp so they serialize as PathOnHost /
PathInContainer / CgroupPermissions to match the Docker API. This is
load-bearing for the Devices bind above — without it, the entry
serializes as camelCase and Docker silently ignores it. Before this PR
no call site populated Devices, so the missing annotations were latent.
Follows the existing pattern of createSidecarContainer() which already
sets CapAdd=[NET_ADMIN] for its own need at container-client.c++:1958.
End-to-end reproduction (libc mount("fuse", ...) syscall succeeding
inside the container, with a wire-level log confirming PathOnHost etc.
reach Docker correctly) at https://github.com/Ben2W/workerd-fuse-local-repro
Signed-off-by: Ben Werner <bewerner23@gmail.com>
67aeb3b to
c12a5d6
Compare
Closes #5609.
Problem
wrangler dev(via miniflare) doesn't support FUSE in locally-spawned Cloudflare Containers, even though Workers using FUSE run fine in production. This is becauseContainerClient::createContainer()POSTs to the user's local Docker daemon's/containers/createendpoint without the HostConfig fields that make/dev/fuseusable inside the container.This PR sets three HostConfig fields on the user's app container:
CapAdd=[SYS_ADMIN],Devices=[/dev/fuse],SecurityOpt=[apparmor:unconfined]. Also adds missing$Json.nameannotations toDeviceMappingindocker-api.capnpI created a reproduction repo that demonstrates using this elevated HostConfig fixes the FUSE mount when running
wrangler dev: https://github.com/Ben2W/workerd-fuse-local-reproNotes
CAP_SYS_ADMINelevates the container's permissions — it covers themount()syscall FUSE needs, alongside a range of other privileged kernel operations, and has historically been the pre-condition for container-escape CVEs (CVE-2022-0492, CVE-2022-0185). The blast radius here is narrow: this code path only runs duringwrangler devon a developer's own machine, against their own Docker daemon, with an image they've chosen. If this is still too coarse for an unconditional default, I'm happy to pivot to an opt-in design similar to what was pitched in #5609.