Skip to content

feat(logical-backup): configurable job history limits and TTL#3091

Open
yajo wants to merge 2 commits into
zalando:masterfrom
moduon:fix/logical-backup-job-cleanup
Open

feat(logical-backup): configurable job history limits and TTL#3091
yajo wants to merge 2 commits into
zalando:masterfrom
moduon:fix/logical-backup-job-cleanup

Conversation

@yajo

@yajo yajo commented May 6, 2026

Copy link
Copy Markdown
Contributor

This PR adds three new configuration options for logical backup cronjobs:

  • logical_backup_successful_jobs_history_limit (default: 3)
  • logical_backup_failed_jobs_history_limit (default: 3)
  • logical_backup_ttl_seconds_after_finished (default: 86400)

Problem

Currently, the postgres-operator does not configure any of these fields on the logical backup CronJob. This means:

  • Kubernetes defaults apply (3 successful, 1 failed), which may not be sufficient for clusters with many PostgreSQL instances.
  • No ttlSecondsAfterFinished is set, so completed/failed backup Jobs and their Pods accumulate indefinitely.
  • When the CronJob is recreated (e.g., after spec changes), old Jobs are orphaned and never cleaned up.

This fixes #1092.

Solution

  1. Added new config fields to the operator configuration structs (Go + CRD + Helm values).
  2. Injected the fields into the generated CronJob and JobTemplate specs.
  3. Updated the CronJob comparison logic so the operator detects changes to these fields and reconciles accordingly.
  4. Added and updated unit tests to verify the new defaults and behavior.

Files changed

  • pkg/util/config/config.go — new config fields
  • pkg/apis/acid.zalan.do/v1/operator_configuration_type.go — CRD struct fields
  • pkg/controller/operator_config.go — wiring from CRD to internal config
  • pkg/cluster/k8sres.go — CronJob generation
  • pkg/cluster/cluster.go — CronJob comparison
  • pkg/cluster/k8sres_test.go — generation tests
  • pkg/cluster/cluster_test.go — comparison tests
  • charts/postgres-operator/values.yaml — Helm values
  • charts/postgres-operator/crds/operatorconfigurations.yaml — CRD schema

Disclaimer

I am no Go programmer. This has been AI-assisted by kimi-k2.6.

@moduon

@FxKu

FxKu commented Jun 1, 2026

Copy link
Copy Markdown
Member

@yajo can you also add the new options to the reference docs?

@FxKu FxKu added this to the 2.0.0 milestone Jun 1, 2026
@FxKu FxKu moved this to Open Questions in Postgres Operator Jun 1, 2026
@yajo yajo force-pushed the fix/logical-backup-job-cleanup branch from 8b1915b to 1768835 Compare June 2, 2026 10:02
Adds three new configuration options for logical backup cronjobs:
- logical_backup_successful_jobs_history_limit (default: 3)
- logical_backup_failed_jobs_history_limit (default: 3)
- logical_backup_ttl_seconds_after_finished (default: 86400)

These options control how many completed/failed backup jobs are
retained by Kubernetes and when finished jobs are automatically
deleted. This prevents accumulation of old backup jobs and pods
in namespaces with many PostgreSQL clusters.

Also updates the CronJob comparison logic to detect changes in
these new fields and trigger reconciliation when needed.

Closes zalando#1092
@yajo yajo force-pushed the fix/logical-backup-job-cleanup branch from 1768835 to 3519e96 Compare June 2, 2026 10:03
@yajo

yajo commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Done. Thanks for reviewing.

@FxKu FxKu added the minor label Jun 2, 2026
@FxKu

FxKu commented Jun 3, 2026

Copy link
Copy Markdown
Member

I've noticed that the new options are still missing here and there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Open Questions

Development

Successfully merging this pull request may close these issues.

Logical backup cronjob cleanup

2 participants