Skip to content

Feat/k8s ovh improvements#1357

Open
geertvandeweyer wants to merge 14 commits intocalypr:developfrom
geertvandeweyer:feat/k8s-ovh-improvements
Open

Feat/k8s ovh improvements#1357
geertvandeweyer wants to merge 14 commits intocalypr:developfrom
geertvandeweyer:feat/k8s-ovh-improvements

Conversation

@geertvandeweyer
Copy link
Copy Markdown

@geertvandeweyer geertvandeweyer commented Mar 19, 2026

Description

This PR contains a couple of fixes made during deployment of Funnel/TES on OVHcloud. They are mainly about mount-point consolidation (bypass the containerd 4096 byte limit for labels), add nerdctl to the docker image, and some stability tweaks.

Motivation and Context

Main motivation was the mountpoint consolidation required for containderd. We use Cromwell upstream of TES/Funnel and each input file is provided as an individual mount. In genomics pipelines, this rapidly blows up.

How Has This Been Tested?

  • Tests were added *test.go files (and passed)
  • Functional testing done on OVHcloud and AWS EKS.

See for info on the deployment : https://geertvandeweyer.github.io/

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have updated the documentation accordingly (link here).
  • I have tested that this feature locally.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • Reviewer has tested this feature locally

Ports 7 commits from the main-branch fork to the develop baseline.

Changes:
- fix(s3): skip GetBucketRegion when endpoint+region are pre-configured
  Non-AWS S3-compatible services (e.g. OVH Object Storage) reject the
  AWS HeadBucket/GetBucketRegion call with 400. Use the pre-configured
  region when both endpoint and region are set.

- feat(kubernetes): add TaskNameLabel + BackoffLimit to job template context
  TaskNameLabel: sanitized Kubernetes-safe pod label value for task name.
  BackoffLimit: resolved from backend_parameters['backoff_limit'], default 10.
  Moved to compute/kubernetes/resources/job.go (develop's job creation path).

- db: combine store and queue in single BoltDB transaction
  Eliminates the window between the two separate Update() calls that could
  cause write-lock contention under high task submission rate.

- server: add gRPC keepalive policy and retry service config
  Prevents GOAWAY/UNAVAILABLE errors on idle gRPC connections between
  the grpc-gateway HTTP proxy and the internal gRPC server.

- worker: add Resources field to DockerCommand + MemoryMB() helper
  Allows container memory limits to be derived from TES task resources.

- worker: consolidate read-only input volumes in file_mapper
  Replaces file-level bind mounts with ancestor-directory mounts to
  prevent EBUSY when tasks call mv/unlink on mounted input paths.
  Also skips readonly volumes already covered by a parent RW mount.

- docker: add nerdctl + CONTAINERD env vars to Dockerfile
  Downloads nerdctl v2.2.1 into the final image and sets
  CONTAINERD_ADDRESS/CONTAINERD_NAMESPACE for host containerd access.

Not ported (intentionally):
- nerdctl-wrap/ (AWS-specific Go wrapper binary, not upstream-relevant)
… configured

SA, Role and RoleBinding creation is designed for per-task cloud credential
injection (e.g. OVH). When the templates are empty (e.g. AWS IRSA setup where
workers run under a shared ServiceAccount), the empty-template parse/decode
would fail. Guard each creation behind a non-empty template check.

Also fixes logic inversion bug: the else branch previously logged an error
when the SA was *found* (err == nil).
The backend previously required GenericS3.Bucket and GenericS3.Region before
creating any PV/PVC for tasks with I/O. This blocked deployments that use
hostPath or other non-S3 PV templates where these fields are unused.

- Remove the fatal GenericS3 check in backend.go createResources
- Guard GenericS3 field access in pv.go CreatePV with a nil/empty check,
  passing empty strings when GenericS3 is not configured

This allows hostPath-based PV setups (e.g. OVH Cinder) to work without
requiring a GenericS3 config block in funnel.yaml.
PVTemplate and PVCTemplate are user-defined; only the upstream S3 CSI
template (driver: s3.csi.aws.com) references {{.Bucket}}, {{.Region}},
and {{.KmsKeyID}}. Deployments on non-AWS infrastructure (e.g. OVH,
on-premise) commonly use hostPath or other PVTemplates that do not
reference these S3-specific fields, and may not have GenericS3
configured at all.

Changes:
- Remove the GenericS3 preflight guard in createResources(): it
  conflated the file-storage backend config (GenericS3) with PV
  template requirements, incorrectly blocking task execution on
  any cluster that does not configure GenericS3.
- Guard GenericS3 slice access in CreatePV and CreatePVC against
  nil/empty to prevent a latent index-out-of-bounds panic; pass
  empty strings when GenericS3 is not configured, which is harmless
  for PVTemplates that do not reference those variables.
…MapTemplate

Most Kubernetes Funnel deployments mount a shared static ConfigMap
(e.g. "funnel-config") for all workers via a volume in WorkerTemplate.
Creating a separate ConfigMap per task is unnecessary in this case and
causes several problems at scale:

  - The full Funnel config (including storage credentials) is serialized
    and stored in etcd once per task, duplicating secrets N times.
  - At thousands of tasks, this adds significant etcd write pressure and
    API-server churn from create/delete events.
  - If the reconciler or worker crashes before cleanup, ConfigMaps leak.
  - Requires configmaps get/create/delete RBAC on the funnel
    ServiceAccount, which is not needed by the static shared CM approach.

A per-task ConfigMap is only needed when WorkerTemplate references a
per-task name such as funnel-worker-config-{{.TaskId}}, as in the
upstream S3 CSI reference setup.

Changes:
- Gate CreateConfigMap/DeleteConfigMap in createResources/cleanResources
  on ConfigMapTemplate != "" (same pattern as ServiceAccountTemplate,
  RoleTemplate, RoleBindingTemplate). Default is empty — no change to
  existing deployments that already set those fields to "".
- Rewrite CreateConfigMap to render ConfigMapTemplate ({{.TaskId}},
  {{.Namespace}}, {{.Config}}) and decode the result as a ConfigMap
  object, consistent with how all other resource templates work.
- Add config/kubernetes/worker-configmap.yaml as a reference template
  for deployments that do need per-task ConfigMaps.
- Document ConfigMapTemplate in default-config.yaml.
…rendering

html/template HTML-escapes interpolated string values (e.g. " → &calypr#34;,
& → &). This is correct for HTML output but wrong for YAML: the
escaped sequences are valid HTML entities but invalid YAML, causing the
YAML parser to fail with "did not find expected alphabetic or numeric
character" whenever a template uses printf to produce a quoted value
(e.g. {{printf "\"%.0fG\"" .RamGb}}).

The Go documentation is explicit: "If the output is not HTML, use
package text/template." All Kubernetes resource templates in this
package produce YAML, not HTML.

The upstream canonical worker-job.yaml happened to avoid the bug because
it uses printf without embedded quotes and hardcoded string literals
without HTML-special characters. The issue surfaces as soon as a
WorkerTemplate uses printf with embedded \" to quote YAML string values,
which is idiomatic YAML-via-template style.

Affects all seven resource files: job, pv, pvc, configmap, role,
roleBinding, serviceaccount.
AddInput() now registers volumes as read-write (Readonly=false) and tracks
them explicitly in a new FileMapper.InputVolumes []Volume field.
consolidateVolumes() uses InputVolumes as its working set instead of
filtering mapper.Volumes by the Readonly flag.

This means Readonly on Volume retains its literal semantics: true = mount
the path read-only in the container. No code path sets it to true for
inputs any more, so the flag is no longer misused as a hidden classifier.

AddVolume() return type changed from error to (bool, error); the bool
signals whether the volume was actually added vs skipped/covered.

TestMapTask updated: expected volumes now reflect the consolidated state
(inputs under /inputs/testdata collapsed to one ancestor mount) and an
InputVolumes assertion is added.
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 19, 2026

Deploy Preview for funnel-dev ready!

Name Link
🔨 Latest commit ef5c430
🔍 Latest deploy log https://app.netlify.com/projects/funnel-dev/deploys/69cf2f3c59703d00088cbb76
😎 Deploy Preview https://deploy-preview-1357--funnel-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@lbeckman314 lbeckman314 self-requested a review March 24, 2026 18:31
@lbeckman314 lbeckman314 self-assigned this Mar 24, 2026
@lbeckman314 lbeckman314 added bug blocker blocks user interactions enhancement labels Mar 24, 2026
@lbeckman314 lbeckman314 moved this to Review in Funnel Mar 24, 2026
@lbeckman314
Copy link
Copy Markdown
Contributor

lbeckman314 commented Mar 24, 2026

Thanks for creating this PR @geertvandeweyer!

Just added it to the Review Queue and should hopefully get it merged and placed in the release pipeline...

We'll update this thread when a release candidate version is ready!

Note

ETA → End-of-Day Thursday, March 26th 🎯

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Funnel/TES deployments on Kubernetes (notably OVHcloud) by reducing container mount/label pressure, adding containerd tooling, and making Kubernetes resource creation and gRPC handling more resilient.

Changes:

  • Consolidates per-input file mounts into ancestor directory mounts to avoid containerd label limits and EBUSY on file bind mounts.
  • Adds/extends Kubernetes resource templating (job TTL defaults, label sanitization, optional per-task ConfigMaps, optional GenericS3 fields) and expands fake-client unit tests.
  • Adds nerdctl to the container image and tweaks server-side gRPC keepalive/retry behavior.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
worker/worker.go Passes task resources into DockerCommand.
worker/docker.go Adds TES resources to DockerCommand and a RAM-to-MB helper.
worker/file_mapper.go Tracks input volumes and consolidates mounts to reduce mount count/label size.
worker/file_mapper_test.go Updates expectations + asserts InputVolumes tracking.
server/server.go Adds gRPC keepalive enforcement and gateway dial keepalives + retry config.
database/boltdb/events.go Combines task create + queue into a single BoltDB transaction.
storage/amazon_s3.go Skips AWS region autodiscovery when using custom endpoint + configured region.
compute/kubernetes/backend.go Makes ConfigMap/SA/Role/RoleBinding creation conditional on templates.
compute/kubernetes/resources/job.go Adds TTL defaulting, backoff limit parsing, label sanitization helpers.
compute/kubernetes/resources/configmap.go Switches to template-rendered ConfigMap creation.
compute/kubernetes/resources/{pv,pvc,role,roleBinding,serviceaccount}.go Switches templates to text/template; makes GenericS3 optional for PV/PVC templates.
tests/kubernetes/kubernetes_test.go Adds unit tests for job/configmap/pv/pvc behavior via fake clientsets; adjusts TestMain skipping logic.
config/default-config.yaml Documents new optional Kubernetes ConfigMapTemplate setting.
config/kubernetes/worker-configmap.yaml Adds a reference per-task ConfigMap template.
Dockerfile Installs build deps + downloads nerdctl; sets containerd env for runtime image.
Comments suppressed due to low confidence (1)

compute/kubernetes/resources/configmap.go:56

  • CreateConfigMap() always calls Create() against conf.Kubernetes.JobsNamespace, so a template that renders metadata.namespace differently will fail with a namespace mismatch (and the comment implies templates can control namespace). Consider validating/overriding cm.Namespace to JobsNamespace, or creating the ConfigMap in cm.Namespace (defaulting to JobsNamespace when empty).
	_, err = client.CoreV1().ConfigMaps(conf.Kubernetes.JobsNamespace).Create(context.Background(), cm, metav1.CreateOptions{})
	if err != nil {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread worker/file_mapper.go
// WorkDir+ancestor is valid by the HostPath() construction invariant
// (every input host path == WorkDir + container path).
nonInputVols = append(nonInputVols, Volume{
HostPath: filepath.Join(mapper.WorkDir, ancestor),
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consolidateVolumes() builds the consolidated HostPath with filepath.Join(mapper.WorkDir, ancestor). Since ancestor is an absolute container path (starts with '/'), filepath.Join will discard WorkDir and produce a host path like "/inputs/..." instead of "/inputs/...". Use mapper.HostPath(ancestor) or path.Join(mapper.WorkDir, ancestor) to keep consolidated mounts under WorkDir.

Suggested change
HostPath: filepath.Join(mapper.WorkDir, ancestor),
HostPath: path.Join(mapper.WorkDir, ancestor),

Copilot uses AI. Check for mistakes.
Comment thread worker/file_mapper.go
Comment on lines +99 to +103
// Consolidate read-only input volumes into the fewest possible ancestor
// directory mounts. This eliminates file-level bind mounts (which cause
// EBUSY when a task script calls mv/unlink on a mounted file path) and
// shrinks the nerdctl/mounts container label.
mapper.consolidateVolumes()
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapTask() comment says "Consolidate read-only input volumes", but AddInput now registers input volumes as read-write and consolidateVolumes() identifies inputs via InputVolumes rather than the Readonly flag. Update this comment to avoid misleading future changes/debugging.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +67
// DeleteConfigMap deletes the per-task ConfigMap created by CreateConfigMap.
// The name must match the metadata.name set in ConfigMapTemplate.
func DeleteConfigMap(ctx context.Context, taskId string, namespace string, client kubernetes.Interface, log *logger.Logger) error {
name := fmt.Sprintf("funnel-worker-config-%s", taskId)
cfg, _ := client.CoreV1().ConfigMaps(namespace).Get(ctx, name, metav1.GetOptions{})
if cfg != nil {
log.Debug("deleting Worker configMap", "taskID", taskId)
err := client.CoreV1().ConfigMaps(namespace).Delete(ctx, name, metav1.DeleteOptions{})
_, err := client.CoreV1().ConfigMaps(namespace).Get(ctx, name, metav1.GetOptions{})
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeleteConfigMap() still deletes the hard-coded name "funnel-worker-config-" even though CreateConfigMap() now renders metadata.name from ConfigMapTemplate and the comment says the name must match the template. This can leak per-task ConfigMaps when a non-default name is used. Consider rendering the template (or at least metadata.name) again to determine the name to delete, or constrain/validate templates to the fixed naming scheme.

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +226
_, err = b.client.CoreV1().ServiceAccounts(config.Kubernetes.JobsNamespace).Get(context.Background(), saName, metav1.GetOptions{})
if err != nil {
return fmt.Errorf("creating Worker ServiceAccount: %v", err)
// ServiceAccount does not exist, create it
b.log.Debug("Creating Worker ServiceAccount", "taskID", task.Id)
err = resources.CreateServiceAccount(task, config, b.client, b.log)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceAccount creation treats any error from the GET as "does not exist" and proceeds to create. This can mask real failures (RBAC/network) and may result in confusing follow-on errors. Prefer checking for NotFound explicitly (e.g., apierrors.IsNotFound(err)) and returning other errors immediately.

Copilot uses AI. Check for mistakes.
Comment thread database/boltdb/events.go
Comment on lines 41 to +45
tx.Bucket(TaskBucket).Put(idBytes, taskString)
tx.Bucket(TaskState).Put(idBytes, []byte(tes.State_QUEUED.String()))
tx.Bucket(TaskOwner).Put(idBytes, []byte(server.GetUsername(ctx)))
// Queue task in same transaction to avoid double-locking
tx.Bucket(TasksQueued).Put(idBytes, []byte{})
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside the BoltDB Update transaction, errors returned by Bucket(...).Put(...) are ignored, so the transaction can commit even if a write fails. Return/propagate errors from each Put (and consider checking Bucket() for nil) so WriteEvent reliably reports storage/queueing failures.

Suggested change
tx.Bucket(TaskBucket).Put(idBytes, taskString)
tx.Bucket(TaskState).Put(idBytes, []byte(tes.State_QUEUED.String()))
tx.Bucket(TaskOwner).Put(idBytes, []byte(server.GetUsername(ctx)))
// Queue task in same transaction to avoid double-locking
tx.Bucket(TasksQueued).Put(idBytes, []byte{})
taskBucket := tx.Bucket(TaskBucket)
if taskBucket == nil {
return fmt.Errorf("bucket %q not found", string(TaskBucket))
}
if err := taskBucket.Put(idBytes, taskString); err != nil {
return err
}
stateBucket := tx.Bucket(TaskState)
if stateBucket == nil {
return fmt.Errorf("bucket %q not found", string(TaskState))
}
if err := stateBucket.Put(idBytes, []byte(tes.State_QUEUED.String())); err != nil {
return err
}
ownerBucket := tx.Bucket(TaskOwner)
if ownerBucket == nil {
return fmt.Errorf("bucket %q not found", string(TaskOwner))
}
if err := ownerBucket.Put(idBytes, []byte(server.GetUsername(ctx))); err != nil {
return err
}
// Queue task in same transaction to avoid double-locking
queuedBucket := tx.Bucket(TasksQueued)
if queuedBucket == nil {
return fmt.Errorf("bucket %q not found", string(TasksQueued))
}
if err := queuedBucket.Put(idBytes, []byte{}); err != nil {
return err
}

Copilot uses AI. Check for mistakes.
Comment thread Dockerfile
Comment on lines 10 to 12
COPY . .
RUN apk add --no-cache bash build-base git protobuf protobuf-dev
RUN --mount=type=cache,target=/root/.cache/go-build make build
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build stage installs packages twice (apk add at line 3 and again at line 11) and the second install runs after COPY . ., which hurts Docker layer caching and slows rebuilds. Consider consolidating into a single apk add (including protobuf packages) before copying the full source tree.

Copilot uses AI. Check for mistakes.
Comment thread Dockerfile
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I am learning that I did not know as much about containers as I thought!

TIL about:

Copy link
Copy Markdown
Contributor

@lbeckman314 lbeckman314 Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to Self: 🔮

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same review + add for Helm Chart! 🔮

@lbeckman314
Copy link
Copy Markdown
Contributor

lbeckman314 commented Mar 29, 2026

Review taking a bit longer than expected, but with the release of v0.11.12.rc-1 this is at the top of queue!

Planning to get this completed by Tuesday, March 31 🎯

@geertvandeweyer
Copy link
Copy Markdown
Author

Review taking a bit longer than expected, but with the release of v0.11.12.rc-1 this is at the top of queue!

Planning to get this completed by Tuesday, March 31 🎯

Thanks for the update. Should I address the review comments already ?

@lbeckman314
Copy link
Copy Markdown
Contributor

Absolutely if able to (and if they're reasonable/correct)...

Otherwise I can plan on reviewing + merging this PR by midday Friday April 3rd!

@geertvandeweyer
Copy link
Copy Markdown
Author

Absolutely if able to (and if they're reasonable/correct)...

Otherwise I can plan on reviewing + merging this PR by midday Friday April 3rd!

I just saw there's quite some functional overlap in what was merged from the dev branch. I'll leave the further merging to you,so I don't accidently break other stuff

@lbeckman314
Copy link
Copy Markdown
Contributor

My apologies for the glacial review + merge of this PR — will revisit start of day Friday (April 17) to get this in the next release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocker blocks user interactions bug enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants