Skip to content

feat: archive build/deploy logs to MinIO for post-eviction retrieval#119

Merged
vigneshrajsb merged 7 commits intomainfrom
feat/archive-build-logs
Mar 3, 2026
Merged

feat: archive build/deploy logs to MinIO for post-eviction retrieval#119
vigneshrajsb merged 7 commits intomainfrom
feat/archive-build-logs

Conversation

@vigneshrajsb
Copy link
Contributor

@vigneshrajsb vigneshrajsb commented Mar 1, 2026

Problem

Build and deploy job logs are permanently lost once k8s Job pods are evicted or TTL-expired (~24h):

  • Job history is fetched entirely from live k8s (getNativeBuildJobs, getDeploymentJobs)
  • Logs are streamed from live pods via WebSocket
  • When pods disappear the UI renders a broken `NotFound` state with no recovery path

Solution

Add MinIO as an optional in-cluster S3-compatible object store. Logs are archived at job completion time and served back transparently — the UI sees a new `Archived` status instead of `NotFound`.

Architecture

```
Job completes → archive logs.txt + metadata.json to MinIO
└── {namespace}/{jobType}/{serviceName}/{jobName}/

Pod evicted after TTL...

UI requests log stream info → backend returns status='Archived' + archivedLogs text
UI requests job list → archived jobs merged into live k8s results (deduplicated by jobName)
```

Changes

New files

File Purpose
`src/server/lib/objectStore/s3Client.ts` S3Client singleton supporting MinIO (forcePathStyle + explicit creds) and AWS S3 (IRSA)
`src/server/services/logArchival.ts` LogArchivalService: archiveLogs, getArchivedLogs, listArchivedJobs, ensureBucket
`src/server/services/types/logArchival.ts` ArchivedJobMetadata interface

Modified files (lifecycle backend)

File Change
`src/shared/config.ts` Export `OBJECT_STORE_*` env vars with safe defaults
`next.config.js` Add `OBJECT_STORE_*` vars to serverRuntimeConfig; add `@aws-sdk/client-s3` to serverComponentsExternalPackages
`src/server/services/types/globalConfig.ts` Add `logArchival?: { enabled: boolean }`
`src/server/services/types/logStreaming.ts` Add `'Archived'` status; add `archivedLogs?` field
`src/server/lib/nativeBuild/engines.ts` Archive build logs after job completes (success + failure paths)
`src/server/lib/nativeHelm/helm.ts` Archive deploy logs after job completes
`src/server/lib/kubernetes/getNativeBuildJobs.ts` Merge archived build jobs; upgrade live jobs with missing pods to `source='archived'`; add `source` field to `BuildJobInfo`
`src/server/lib/kubernetes/getDeploymentJobs.ts` Same pattern for deploy jobs; add `source` field to `DeploymentJobInfo`
`src/server/services/logStreaming.ts` Fall back to archived log lookup when k8s returns NotFound
`src/shared/openApiSpec.ts` Add `'Archived'` status, `archivedLogs` field, `source` field to all relevant schemas
`src/server/db/migrations/001_seed.ts` Seed `logArchival` config row (disabled by default)
`src/pages/api/v1/.../buildLogs.ts` Add `podName`, `error`, `source` fields to inline OpenAPI schema
`src/pages/api/v1/.../deployLogs.ts` Add `Pending` to status enum, add `source` field to inline OpenAPI schema
`src/pages/api/v1/.../logs/[jobName].ts` Add `'Archived'` status and `archivedLogs` to inline OpenAPI schema
`helm/environments/local/lifecycle.yaml` Add `OBJECT_STORE_*` env vars for local dev MinIO
`Tiltfile` Deploy MinIO via `helm_resource` for local dev

Related PRs

Key design decisions

Feature-gated: all object store calls check `globalConfig.logArchival?.enabled`. Seeded as `false` — enabling requires an explicit DB update. Deploying the infra (MinIO pod) is safe before enabling the flag.

Non-blocking: archival failures are caught and logged as warnings — they never fail the build/deploy flow.

Deduplication: merged archived jobs are deduplicated by `jobName` against live k8s results, so a completing job never appears twice.

S3 support: set `OBJECT_STORE_TYPE=s3` to use AWS S3 with IRSA — no credentials in config. Bucket must be pre-provisioned.

Enabling

  1. Deploy MinIO via helm-charts (v0.8.0+) with `minio.enabled=true` and `secrets.objectStore.enabled=true`
  2. Insert into `global_config`:
    ```json
    { "logArchival": { "enabled": true } }
    ```

Test plan

  • `pnpm lint` passes ✅
  • `pnpm ts-check` — no new errors (pre-existing errors in scripts/ and engines.ts unrelated to this PR) ✅
  • `pnpm test` — 951/951 pass ✅
  • With `logArchival.enabled=false` (default): system behavior identical to before, no object store calls
  • With `logArchival.enabled=true`: trigger a build, verify `logs.txt` + `metadata.json` appear in MinIO bucket
  • Delete the job pod manually, verify it still appears in the build job list with `source='archived'`
  • Click the archived job in the UI — logs render via `staticContent` (not WebSocket)

🤖 Generated with Claude Code

vigneshrajsb and others added 3 commits March 1, 2026 14:38
Adds a pino formatters.level option so logs include string severity
labels (e.g. "level":"info") rather than numeric codes (e.g. "level":30).
This fixes log severity mapping in Groundcover.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Build and deploy job logs are permanently lost once k8s Job pods are
evicted or TTL-expired (~24h). This adds MinIO as an optional in-cluster
object store to archive logs at completion time, serving them back to the
UI even after the live pods are gone.

## New files
- src/server/lib/objectStore/s3Client.ts
  MinIO client singleton configured via MINIO_* env vars

- src/server/services/logArchival.ts
  LogArchivalService with archiveLogs, getArchivedLogs, getArchivedMetadata,
  listArchivedJobs, ensureBucket, configureRetention

- src/server/services/types/logArchival.ts
  ArchivedJobMetadata interface

## Modified files
- src/shared/config.ts / next.config.js
  Export MINIO_ENDPOINT, MINIO_PORT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY,
  MINIO_BUCKET, MINIO_USE_SSL (all with safe defaults)

- src/server/services/types/globalConfig.ts
  Add logArchival?: { enabled: boolean; retentionDays: number } to GlobalConfig

- src/server/services/types/logStreaming.ts
  Add 'Archived' to status union; add archivedLogs?: string field

- src/server/lib/nativeBuild/engines.ts
  After waitForJobAndGetLogs(), archive logs when logArchival.enabled=true
  Both success and error paths are covered

- src/server/lib/nativeHelm/helm.ts
  Same pattern for native Helm deploy jobs

- src/server/lib/kubernetes/getNativeBuildJobs.ts
  Merge archived build jobs (not present in live k8s) into the listing
  Add source?: 'live' | 'archived' field to BuildJobInfo

- src/server/lib/kubernetes/getDeploymentJobs.ts
  Same for deploy jobs / DeploymentJobInfo

- src/server/services/logStreaming.ts
  When k8s returns NotFound, attempt archived log lookup before returning
  NotFound. Returns status='Archived' with archivedLogs when found.

- helm/web-app/Chart.yaml + helm/environments/local/lifecycle.yaml
  Add minio subchart dependency (disabled by default in local values)

## Storage schema
  lifecycle-logs/
    {namespace}/{jobType}/{serviceName}/{jobName}/
      logs.txt       - full log content
      metadata.json  - job info (status, duration, sha, engine, timestamps)

## Enabling
All archival ops are gated on globalConfig.logArchival.enabled.
Insert into global_config to activate:
  { "logArchival": { "enabled": true, "retentionDays": 14 } }

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vigneshrajsb vigneshrajsb requested a review from a team as a code owner March 1, 2026 23:14
vigneshrajsb and others added 4 commits March 2, 2026 00:11
- Fix JobMonitor log ordering: wait for job completion before fetching
  logs so the full output is captured rather than a mid-run snapshot
- Add startedAt/completedAt/duration to JobMonitor.getJobStatus via
  kubectl job JSON, thread timing through engines.ts and helm.ts so
  archived metadata has accurate timestamps
- Upgrade live k8s jobs with no pod to source='archived' when an
  archive exists in MinIO, so they remain selectable in the UI
- Extend logStreaming archived fallback to also trigger when the k8s
  job exists but its pod has been cleaned up (!podInfo.podName)
- Add source field to NativeBuildJobInfo OpenAPI schema
- Add MinIO helm_resource to Tiltfile; remove erroneous minio subchart
  dependency from helm/web-app/Chart.yaml
- Add ALLOWED_ORIGINS to local lifecycle.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- logArchival: lazy ensureBucket on first write, null-safe S3 Body
  reads, paginated ListObjectsV2 for archived job listing
- getDeploymentJobs: port podless-to-archived source upgrade logic
  matching build job behaviour
- openApiSpec + API route schemas: add source and archivedLogs fields,
  fix LogStreamResponse required field list, correct status enum values
- 001_seed: add logArchival feature-flag row to globalConfig seed
- globalConfig types: drop unused retentionDays field
- logs/[jobName] API: cast type query param to LogType to satisfy TS

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add @aws-sdk/client-s3 dependency and s3Client singleton
- Wire OBJECT_STORE_* env vars through config.ts and next.config.js serverRuntimeConfig
- Configure local dev MinIO env vars in helm/environments/local/lifecycle.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents warning noise on fresh installs where MinIO is not yet
configured.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vigneshrajsb vigneshrajsb merged commit 8ef6df5 into main Mar 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants