Conversation
Ports 7 commits from the main-branch fork to the develop baseline. Changes: - fix(s3): skip GetBucketRegion when endpoint+region are pre-configured Non-AWS S3-compatible services (e.g. OVH Object Storage) reject the AWS HeadBucket/GetBucketRegion call with 400. Use the pre-configured region when both endpoint and region are set. - feat(kubernetes): add TaskNameLabel + BackoffLimit to job template context TaskNameLabel: sanitized Kubernetes-safe pod label value for task name. BackoffLimit: resolved from backend_parameters['backoff_limit'], default 10. Moved to compute/kubernetes/resources/job.go (develop's job creation path). - db: combine store and queue in single BoltDB transaction Eliminates the window between the two separate Update() calls that could cause write-lock contention under high task submission rate. - server: add gRPC keepalive policy and retry service config Prevents GOAWAY/UNAVAILABLE errors on idle gRPC connections between the grpc-gateway HTTP proxy and the internal gRPC server. - worker: add Resources field to DockerCommand + MemoryMB() helper Allows container memory limits to be derived from TES task resources. - worker: consolidate read-only input volumes in file_mapper Replaces file-level bind mounts with ancestor-directory mounts to prevent EBUSY when tasks call mv/unlink on mounted input paths. Also skips readonly volumes already covered by a parent RW mount. - docker: add nerdctl + CONTAINERD env vars to Dockerfile Downloads nerdctl v2.2.1 into the final image and sets CONTAINERD_ADDRESS/CONTAINERD_NAMESPACE for host containerd access. Not ported (intentionally): - nerdctl-wrap/ (AWS-specific Go wrapper binary, not upstream-relevant)
… configured SA, Role and RoleBinding creation is designed for per-task cloud credential injection (e.g. OVH). When the templates are empty (e.g. AWS IRSA setup where workers run under a shared ServiceAccount), the empty-template parse/decode would fail. Guard each creation behind a non-empty template check. Also fixes logic inversion bug: the else branch previously logged an error when the SA was *found* (err == nil).
The backend previously required GenericS3.Bucket and GenericS3.Region before creating any PV/PVC for tasks with I/O. This blocked deployments that use hostPath or other non-S3 PV templates where these fields are unused. - Remove the fatal GenericS3 check in backend.go createResources - Guard GenericS3 field access in pv.go CreatePV with a nil/empty check, passing empty strings when GenericS3 is not configured This allows hostPath-based PV setups (e.g. OVH Cinder) to work without requiring a GenericS3 config block in funnel.yaml.
…eation" This reverts commit 0c6f67a.
PVTemplate and PVCTemplate are user-defined; only the upstream S3 CSI
template (driver: s3.csi.aws.com) references {{.Bucket}}, {{.Region}},
and {{.KmsKeyID}}. Deployments on non-AWS infrastructure (e.g. OVH,
on-premise) commonly use hostPath or other PVTemplates that do not
reference these S3-specific fields, and may not have GenericS3
configured at all.
Changes:
- Remove the GenericS3 preflight guard in createResources(): it
conflated the file-storage backend config (GenericS3) with PV
template requirements, incorrectly blocking task execution on
any cluster that does not configure GenericS3.
- Guard GenericS3 slice access in CreatePV and CreatePVC against
nil/empty to prevent a latent index-out-of-bounds panic; pass
empty strings when GenericS3 is not configured, which is harmless
for PVTemplates that do not reference those variables.
…MapTemplate
Most Kubernetes Funnel deployments mount a shared static ConfigMap
(e.g. "funnel-config") for all workers via a volume in WorkerTemplate.
Creating a separate ConfigMap per task is unnecessary in this case and
causes several problems at scale:
- The full Funnel config (including storage credentials) is serialized
and stored in etcd once per task, duplicating secrets N times.
- At thousands of tasks, this adds significant etcd write pressure and
API-server churn from create/delete events.
- If the reconciler or worker crashes before cleanup, ConfigMaps leak.
- Requires configmaps get/create/delete RBAC on the funnel
ServiceAccount, which is not needed by the static shared CM approach.
A per-task ConfigMap is only needed when WorkerTemplate references a
per-task name such as funnel-worker-config-{{.TaskId}}, as in the
upstream S3 CSI reference setup.
Changes:
- Gate CreateConfigMap/DeleteConfigMap in createResources/cleanResources
on ConfigMapTemplate != "" (same pattern as ServiceAccountTemplate,
RoleTemplate, RoleBindingTemplate). Default is empty — no change to
existing deployments that already set those fields to "".
- Rewrite CreateConfigMap to render ConfigMapTemplate ({{.TaskId}},
{{.Namespace}}, {{.Config}}) and decode the result as a ConfigMap
object, consistent with how all other resource templates work.
- Add config/kubernetes/worker-configmap.yaml as a reference template
for deployments that do need per-task ConfigMaps.
- Document ConfigMapTemplate in default-config.yaml.
…rendering html/template HTML-escapes interpolated string values (e.g. " → &calypr#34;, & → &). This is correct for HTML output but wrong for YAML: the escaped sequences are valid HTML entities but invalid YAML, causing the YAML parser to fail with "did not find expected alphabetic or numeric character" whenever a template uses printf to produce a quoted value (e.g. {{printf "\"%.0fG\"" .RamGb}}). The Go documentation is explicit: "If the output is not HTML, use package text/template." All Kubernetes resource templates in this package produce YAML, not HTML. The upstream canonical worker-job.yaml happened to avoid the bug because it uses printf without embedded quotes and hardcoded string literals without HTML-special characters. The issue surfaces as soon as a WorkerTemplate uses printf with embedded \" to quote YAML string values, which is idiomatic YAML-via-template style. Affects all seven resource files: job, pv, pvc, configmap, role, roleBinding, serviceaccount.
…eted pods on nodes
AddInput() now registers volumes as read-write (Readonly=false) and tracks them explicitly in a new FileMapper.InputVolumes []Volume field. consolidateVolumes() uses InputVolumes as its working set instead of filtering mapper.Volumes by the Readonly flag. This means Readonly on Volume retains its literal semantics: true = mount the path read-only in the container. No code path sets it to true for inputs any more, so the flag is no longer misused as a hidden classifier. AddVolume() return type changed from error to (bool, error); the bool signals whether the volume was actually added vs skipped/covered. TestMapTask updated: expected volumes now reflect the consolidated state (inputs under /inputs/testdata collapsed to one ancestor mount) and an InputVolumes assertion is added.
✅ Deploy Preview for funnel-dev ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Thanks for creating this PR @geertvandeweyer! Just added it to the Review Queue and should hopefully get it merged and placed in the release pipeline... We'll update this thread when a release candidate version is ready! Note ETA → End-of-Day Thursday, March 26th 🎯 |
There was a problem hiding this comment.
Pull request overview
This PR improves Funnel/TES deployments on Kubernetes (notably OVHcloud) by reducing container mount/label pressure, adding containerd tooling, and making Kubernetes resource creation and gRPC handling more resilient.
Changes:
- Consolidates per-input file mounts into ancestor directory mounts to avoid containerd label limits and EBUSY on file bind mounts.
- Adds/extends Kubernetes resource templating (job TTL defaults, label sanitization, optional per-task ConfigMaps, optional GenericS3 fields) and expands fake-client unit tests.
- Adds nerdctl to the container image and tweaks server-side gRPC keepalive/retry behavior.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| worker/worker.go | Passes task resources into DockerCommand. |
| worker/docker.go | Adds TES resources to DockerCommand and a RAM-to-MB helper. |
| worker/file_mapper.go | Tracks input volumes and consolidates mounts to reduce mount count/label size. |
| worker/file_mapper_test.go | Updates expectations + asserts InputVolumes tracking. |
| server/server.go | Adds gRPC keepalive enforcement and gateway dial keepalives + retry config. |
| database/boltdb/events.go | Combines task create + queue into a single BoltDB transaction. |
| storage/amazon_s3.go | Skips AWS region autodiscovery when using custom endpoint + configured region. |
| compute/kubernetes/backend.go | Makes ConfigMap/SA/Role/RoleBinding creation conditional on templates. |
| compute/kubernetes/resources/job.go | Adds TTL defaulting, backoff limit parsing, label sanitization helpers. |
| compute/kubernetes/resources/configmap.go | Switches to template-rendered ConfigMap creation. |
| compute/kubernetes/resources/{pv,pvc,role,roleBinding,serviceaccount}.go | Switches templates to text/template; makes GenericS3 optional for PV/PVC templates. |
| tests/kubernetes/kubernetes_test.go | Adds unit tests for job/configmap/pv/pvc behavior via fake clientsets; adjusts TestMain skipping logic. |
| config/default-config.yaml | Documents new optional Kubernetes ConfigMapTemplate setting. |
| config/kubernetes/worker-configmap.yaml | Adds a reference per-task ConfigMap template. |
| Dockerfile | Installs build deps + downloads nerdctl; sets containerd env for runtime image. |
Comments suppressed due to low confidence (1)
compute/kubernetes/resources/configmap.go:56
- CreateConfigMap() always calls Create() against conf.Kubernetes.JobsNamespace, so a template that renders metadata.namespace differently will fail with a namespace mismatch (and the comment implies templates can control namespace). Consider validating/overriding cm.Namespace to JobsNamespace, or creating the ConfigMap in cm.Namespace (defaulting to JobsNamespace when empty).
_, err = client.CoreV1().ConfigMaps(conf.Kubernetes.JobsNamespace).Create(context.Background(), cm, metav1.CreateOptions{})
if err != nil {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // WorkDir+ancestor is valid by the HostPath() construction invariant | ||
| // (every input host path == WorkDir + container path). | ||
| nonInputVols = append(nonInputVols, Volume{ | ||
| HostPath: filepath.Join(mapper.WorkDir, ancestor), |
There was a problem hiding this comment.
consolidateVolumes() builds the consolidated HostPath with filepath.Join(mapper.WorkDir, ancestor). Since ancestor is an absolute container path (starts with '/'), filepath.Join will discard WorkDir and produce a host path like "/inputs/..." instead of "/inputs/...". Use mapper.HostPath(ancestor) or path.Join(mapper.WorkDir, ancestor) to keep consolidated mounts under WorkDir.
| HostPath: filepath.Join(mapper.WorkDir, ancestor), | |
| HostPath: path.Join(mapper.WorkDir, ancestor), |
| // Consolidate read-only input volumes into the fewest possible ancestor | ||
| // directory mounts. This eliminates file-level bind mounts (which cause | ||
| // EBUSY when a task script calls mv/unlink on a mounted file path) and | ||
| // shrinks the nerdctl/mounts container label. | ||
| mapper.consolidateVolumes() |
There was a problem hiding this comment.
MapTask() comment says "Consolidate read-only input volumes", but AddInput now registers input volumes as read-write and consolidateVolumes() identifies inputs via InputVolumes rather than the Readonly flag. Update this comment to avoid misleading future changes/debugging.
| // DeleteConfigMap deletes the per-task ConfigMap created by CreateConfigMap. | ||
| // The name must match the metadata.name set in ConfigMapTemplate. | ||
| func DeleteConfigMap(ctx context.Context, taskId string, namespace string, client kubernetes.Interface, log *logger.Logger) error { | ||
| name := fmt.Sprintf("funnel-worker-config-%s", taskId) | ||
| cfg, _ := client.CoreV1().ConfigMaps(namespace).Get(ctx, name, metav1.GetOptions{}) | ||
| if cfg != nil { | ||
| log.Debug("deleting Worker configMap", "taskID", taskId) | ||
| err := client.CoreV1().ConfigMaps(namespace).Delete(ctx, name, metav1.DeleteOptions{}) | ||
| _, err := client.CoreV1().ConfigMaps(namespace).Get(ctx, name, metav1.GetOptions{}) |
There was a problem hiding this comment.
DeleteConfigMap() still deletes the hard-coded name "funnel-worker-config-" even though CreateConfigMap() now renders metadata.name from ConfigMapTemplate and the comment says the name must match the template. This can leak per-task ConfigMaps when a non-default name is used. Consider rendering the template (or at least metadata.name) again to determine the name to delete, or constrain/validate templates to the fixed naming scheme.
| _, err = b.client.CoreV1().ServiceAccounts(config.Kubernetes.JobsNamespace).Get(context.Background(), saName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return fmt.Errorf("creating Worker ServiceAccount: %v", err) | ||
| // ServiceAccount does not exist, create it | ||
| b.log.Debug("Creating Worker ServiceAccount", "taskID", task.Id) | ||
| err = resources.CreateServiceAccount(task, config, b.client, b.log) |
There was a problem hiding this comment.
ServiceAccount creation treats any error from the GET as "does not exist" and proceeds to create. This can mask real failures (RBAC/network) and may result in confusing follow-on errors. Prefer checking for NotFound explicitly (e.g., apierrors.IsNotFound(err)) and returning other errors immediately.
| tx.Bucket(TaskBucket).Put(idBytes, taskString) | ||
| tx.Bucket(TaskState).Put(idBytes, []byte(tes.State_QUEUED.String())) | ||
| tx.Bucket(TaskOwner).Put(idBytes, []byte(server.GetUsername(ctx))) | ||
| // Queue task in same transaction to avoid double-locking | ||
| tx.Bucket(TasksQueued).Put(idBytes, []byte{}) |
There was a problem hiding this comment.
Inside the BoltDB Update transaction, errors returned by Bucket(...).Put(...) are ignored, so the transaction can commit even if a write fails. Return/propagate errors from each Put (and consider checking Bucket() for nil) so WriteEvent reliably reports storage/queueing failures.
| tx.Bucket(TaskBucket).Put(idBytes, taskString) | |
| tx.Bucket(TaskState).Put(idBytes, []byte(tes.State_QUEUED.String())) | |
| tx.Bucket(TaskOwner).Put(idBytes, []byte(server.GetUsername(ctx))) | |
| // Queue task in same transaction to avoid double-locking | |
| tx.Bucket(TasksQueued).Put(idBytes, []byte{}) | |
| taskBucket := tx.Bucket(TaskBucket) | |
| if taskBucket == nil { | |
| return fmt.Errorf("bucket %q not found", string(TaskBucket)) | |
| } | |
| if err := taskBucket.Put(idBytes, taskString); err != nil { | |
| return err | |
| } | |
| stateBucket := tx.Bucket(TaskState) | |
| if stateBucket == nil { | |
| return fmt.Errorf("bucket %q not found", string(TaskState)) | |
| } | |
| if err := stateBucket.Put(idBytes, []byte(tes.State_QUEUED.String())); err != nil { | |
| return err | |
| } | |
| ownerBucket := tx.Bucket(TaskOwner) | |
| if ownerBucket == nil { | |
| return fmt.Errorf("bucket %q not found", string(TaskOwner)) | |
| } | |
| if err := ownerBucket.Put(idBytes, []byte(server.GetUsername(ctx))); err != nil { | |
| return err | |
| } | |
| // Queue task in same transaction to avoid double-locking | |
| queuedBucket := tx.Bucket(TasksQueued) | |
| if queuedBucket == nil { | |
| return fmt.Errorf("bucket %q not found", string(TasksQueued)) | |
| } | |
| if err := queuedBucket.Put(idBytes, []byte{}); err != nil { | |
| return err | |
| } |
| COPY . . | ||
| RUN apk add --no-cache bash build-base git protobuf protobuf-dev | ||
| RUN --mount=type=cache,target=/root/.cache/go-build make build |
There was a problem hiding this comment.
The build stage installs packages twice (apk add at line 3 and again at line 11) and the second install runs after COPY . ., which hurts Docker layer caching and slows rebuilds. Consider consolidating into a single apk add (including protobuf packages) before copying the full source tree.
There was a problem hiding this comment.
Today I am learning that I did not know as much about containers as I thought!
TIL about:
There was a problem hiding this comment.
Note to Self: 🔮
- Review + add if OK to Funnel Helm Chart
There was a problem hiding this comment.
Same review + add for Helm Chart! 🔮
|
Review taking a bit longer than expected, but with the release of v0.11.12.rc-1 this is at the top of queue! Planning to get this completed by Tuesday, March 31 🎯 |
Thanks for the update. Should I address the review comments already ? |
|
Absolutely if able to (and if they're reasonable/correct)... Otherwise I can plan on reviewing + merging this PR by midday Friday April 3rd! |
I just saw there's quite some functional overlap in what was merged from the dev branch. I'll leave the further merging to you,so I don't accidently break other stuff |
|
My apologies for the glacial review + merge of this PR — will revisit start of day Friday (April 17) to get this in the next release! |
Description
This PR contains a couple of fixes made during deployment of Funnel/TES on OVHcloud. They are mainly about mount-point consolidation (bypass the containerd 4096 byte limit for labels), add nerdctl to the docker image, and some stability tweaks.
Motivation and Context
Main motivation was the mountpoint consolidation required for containderd. We use Cromwell upstream of TES/Funnel and each input file is provided as an individual mount. In genomics pipelines, this rapidly blows up.
How Has This Been Tested?
See for info on the deployment : https://geertvandeweyer.github.io/
Types of Changes
Checklist