Skip to content

fix(rr): decouple agentSandbox postgres from externalSecret (use postgres.urlSecretName); require encryption key#331

Open
JatinNanda wants to merge 5 commits into
mainfrom
fix-rr-agent-sandbox-secret-overload
Open

fix(rr): decouple agentSandbox postgres from externalSecret (use postgres.urlSecretName); require encryption key#331
JatinNanda wants to merge 5 commits into
mainfrom
fix-rr-agent-sandbox-secret-overload

Conversation

@JatinNanda

@JatinNanda JatinNanda commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

what

setting rr.agentSandbox.externalSecret.name (intended just for the JWT keypair) silently overloaded two unrelated things, breaking enablement on a deployment that only wanted to reuse its existing backend postgres:

  • postgres hijackpostgresUrlEnv routed AGENT_SANDBOX_POSTGRES_URL to the external secret's postgres-url key. missing key → getaddrinfo ENOTFOUND at runtime. the inherit-the-backend path was unreachable whenever externalSecret.name was set, and validateSecrets counted it as an explicit pg source so nothing failed at template time.
  • encryption key was treated as optional — but it isn't. the proxy derives the sandbox-iframe asset-token HMAC key from AGENT_SANDBOX_ENCRYPTION_KEY (agent_executor/proxy/src/config.ts getAssetTokenHmacSecret) and throws the first time it serves a sandbox if it's missing. the pod comes up green and then fails at first use.

separately, the snapshot blob-storage env never reached the workflow worker, even though the agent/snapshot temporal activities run there.

changes

  • externalSecret.name now covers only the JWT/encryption keys — it never sources postgres. to read a DSN from that same secret, point postgres.urlSecretName at it (Option 3; urlSecretKey defaults to postgres-url). otherwise postgres inherits config.postgresql — an existing deployment needs no DB config.
  • require the agent sandbox encryption key in validateSecrets (mirrors the JWT-key checks) — fail loudly at render instead of shipping a pod that breaks at first sandbox use; both fail hints point at postgres.urlSecretName
  • render gitServer.commonEnv (RR_DEFAULT_*) onto the workflow worker when rr.gitServer.enabled. the agentExecutor/snapshotRetention temporal activities run on WORKFLOW_TEMPORAL_WORKER and read snapshot blob storage (getBlobStoreForSnapshots, RR_SNAPSHOTS_*RR_DEFAULT_* fallback), but commonEnv was only injected onto the backend + standalone git-server pod — so the worker had no creds and snapshot access failed there.
  • values.yaml (both copies): drop postgres-url from the externalSecret.name key list, mark encryptionKey required (64 hex, openssl rand -hex 32)
  • bump chart 6.11.1 → 6.11.2; update ci/ fixtures to supply the now-required encryption key

incorporates #332

pulled in @ryanartecona's passwordSecretName support so the two don't collide. in the final shape it lives in the postgres.urlSecretName branch: the blueprints case (DSN from the agent-sandbox secret, password from the auto-rotated main secret) is postgres.urlSecretName: agent-sandbox-env + postgres.passwordSecretName: retool-config.

note: an earlier revision used a postgres.fromExternalSecret boolean — redundant with postgres.urlSecretName, since removed in favor of it.

testing

helm template verified: urlSecretName routes to the secret's postgres-url (+ PGPASSWORD when passwordSecretName set); JWT-only externalSecret inherits the backend DSN; render fails with a clear error when no encryption-key source is set; the workflow worker now renders RR_DEFAULT_* when gitServer.enabled + blobStorage is set. all ci/ fixtures render; helm lint clean.

note: same-origin-on-HTTPS (a separate reported issue) is a backend gap, not chart-fixable, and is out of scope here.

🤖 Generated with Claude Code

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown

Greptile Summary

This PR decouples agentSandbox.externalSecret.name from Postgres sourcing (it now covers only JWT/encryption keys), promotes encryptionKey from optional to required in validateSecrets, and adds gitServer.commonEnv (blob-storage vars) to the workflow worker so snapshot activities can reach blob storage.

  • Postgres decoupling: $explicitPg no longer includes $ext; the old externalSecret.name → postgres-url branch in postgresUrlEnv is removed and replaced by the explicit postgres.urlSecretName path (Option 3), which now also supports a separate passwordSecretName for auto-rotated passwords.
  • Encryption key enforcement: a new validateSecrets guard fails at render time if neither encryptionKey nor externalSecret.name is set, preventing the silent pod-starts-but-throws-at-first-sandbox-use failure.
  • Worker blob-storage env: retool.gitServer.commonEnv is now injected into all worker deployments when rr.gitServer.enabled, surfacing RR_DEFAULT_* so agentExecutor/snapshotRetention temporal activities can resolve blob storage on the worker node.

Confidence Score: 5/5

Safe to merge — the three independent fixes (postgres decoupling, encryption key enforcement, worker blob-storage env) each address a clear breakage with no new footguns introduced.

All three changes are narrowly scoped to the agentSandbox / gitServer path, the validation logic is sound (fail messages are accurate, the $explicitPg simplification is correct), the CI fixtures now supply the required encryption key, and the postgresUrlEnv template correctly handles the new urlSecretName + passwordSecretName combination. No pre-existing behaviour outside the changed paths is affected.

No files require special attention.

Important Files Changed

Filename Overview
charts/retool/templates/_helpers.tpl Core fix: removes externalSecret.name from $explicitPg check, adds encryptionKey require-guard, removes the postgres-url branch from postgresUrlEnv, adds passwordSecretName support in the urlSecretName path — logic is consistent and validation messages are accurate.
charts/retool/templates/_workers.tpl Adds retool.gitServer.commonEnv (blob-storage vars) to the worker deployment template when gitServer.enabled; applied to all worker types — harmless for non-workflow workers and consistent with how backendEnvVars is already injected.
charts/retool/values.yaml Removes Option 4 (externalSecret postgres-url), marks encryptionKey as REQUIRED, adds clarifying comments for urlSecretName/passwordSecretName; accurate and consistent with template changes.
charts/retool/ci/test-agent-sandbox-enabled-option.yaml Updates the CI fixture from the old Option 4 path to Option 3 (urlSecretName pointing at the same secret as externalSecret.name); correctly exercises the new behavior.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[agentSandbox.enabled] --> V[validateSecrets]
    V --> PG{postgres explicit?}
    PG -- "url / host / urlSecretName set" --> PG_OK[explicit path OK]
    PG -- none set --> INHERIT{config.postgresql has host?}
    INHERIT -- yes --> PASS_OK[inherit backend PG]
    INHERIT -- no --> FAIL1[fail: set postgres.url / .host / .urlSecretName]
    V --> ENC{encryptionKey or externalSecret.name?}
    ENC -- yes --> ENC_OK[encryption key present]
    ENC -- no --> FAIL2[fail: set encryptionKey or externalSecret.name]
    V --> JWT{jwtPublicKey or externalSecret.name?}
    JWT -- yes --> JWT_OK[JWT keys present]
    JWT -- no --> FAIL3[fail: set jwtPublicKey]
    PG_OK --> ENV[postgresUrlEnv]
    PASS_OK --> ENV
    ENV --> OPT1[postgres.url - value literal]
    ENV --> OPT2[postgres.host - assembled URL + PGPASSWORD]
    ENV --> OPT3[postgres.urlSecretName - secretKeyRef + optional PGPASSWORD]
    ENV --> OPT4[default - inherit backend config.postgresql]
    GS{rr.gitServer.enabled?} -- yes --> BLOBENV[gitServer.commonEnv RR_DEFAULT_* blob-storage vars]
    BLOBENV --> W[all worker pods]
    GS -- no --> W
Loading

Reviews (4): Last reviewed commit: "chore(rr): bump chart version to 6.11.3 ..." | Re-trigger Greptile

@JatinNanda JatinNanda force-pushed the fix-rr-agent-sandbox-secret-overload branch from 7fb7343 to 78fc2ad Compare June 12, 2026 23:10
@JatinNanda JatinNanda changed the title fix(rr): stop agentSandbox externalSecret.name from hijacking postgres fix(rr): stop agentSandbox externalSecret.name from hijacking postgres; require its encryption key Jun 12, 2026
@JatinNanda JatinNanda force-pushed the fix-rr-agent-sandbox-secret-overload branch 3 times, most recently from e70c5cf to bd0f3a3 Compare June 12, 2026 23:44
@JatinNanda JatinNanda marked this pull request as ready for review June 13, 2026 00:02
@JatinNanda JatinNanda requested a review from ryanartecona June 13, 2026 00:02
@JatinNanda JatinNanda changed the title fix(rr): stop agentSandbox externalSecret.name from hijacking postgres; require its encryption key fix(rr): agentsandbox secret & postgres sourcing — opt-in external postgres, required encryption key, passwordSecretName support Jun 13, 2026
@ryanartecona

Copy link
Copy Markdown
Contributor

uhhh what do you think about instead of rr.agentSandbox.postgres.fromExternalSecret: true, we just mirror the pattern we have for other stuff already and introduce rr.agentSandbox.postgres.urlSecret{Name,Key}? that is IMO more flexible & straightforward with the slight tradeoff of being slightly more verbose.

@JatinNanda JatinNanda changed the title fix(rr): agentsandbox secret & postgres sourcing — opt-in external postgres, required encryption key, passwordSecretName support fix(rr): decouple agentSandbox postgres from externalSecret (use postgres.urlSecretName); require encryption key Jun 13, 2026
JatinNanda and others added 5 commits June 12, 2026 20:41
…s; require its encryption key

setting rr.agentSandbox.externalSecret.name (intended for the JWT keypair) silently
(a) routed the agent sandbox postgres connection to that secret's postgres-url key
and (b) coupled the encryption key to it. an existing deployment that just wanted to
reuse its backend postgres crashed at runtime with getaddrinfo ENOTFOUND.

- postgres now only reads from externalSecret when postgres.fromExternalSecret: true
  (default false); otherwise it inherits the backend's config.postgresql connection,
  even when externalSecret.name is set for the JWT
- require the agent sandbox encryption key (validateSecrets, mirroring the JWT keys):
  the proxy derives the sandbox-iframe asset-token HMAC key from AGENT_SANDBOX_ENCRYPTION_KEY
  and throws when serving a sandbox without it (agent_executor/proxy/src/config.ts
  getAssetTokenHmacSecret), so 'optional' would surface as a green pod that fails at
  first sandbox use. fail loudly at render instead.
- updated both validateSecrets fail hints to mention postgres.fromExternalSecret: true
- values.yaml (both copies): add postgres.fromExternalSecret, mark encryptionKey required, note 64 hex
- bump chart 6.11.1 -> 6.11.2; update ci fixtures to supply the now-required encryption key
  and add fromExternalSecret: true to the Option-4 sandbox fixture
…cretName

the fromExternalSecret boolean was redundant with the existing Option-3
postgres.urlSecretName/urlSecretKey: 'read postgres-url from externalSecret.name'
is identical to pointing urlSecretName at that same secret (urlSecretKey already
defaults to postgres-url). drop the bespoke flag and the externalSecret postgres
branch entirely, so externalSecret.name covers ONLY the JWT/encryption keys and
never sources postgres.

- remove rr.agentSandbox.postgres.fromExternalSecret (both values.yaml copies)
- postgresUrlEnv: delete the externalSecret postgres branch; move the
  passwordSecretName PGPASSWORD support (#332) into the urlSecretName branch, so
  'DSN from a secret + auto-rotated password from another secret' uses Option 3
- validateSecrets: drop the externalSecret term from $explicitPg; repoint both
  fail hints to postgres.urlSecretName
- ci: test-agent-sandbox-enabled-option uses postgres.urlSecretName (Option 3)
  pointed at its JWT secret instead of the removed flag
the agentExecutor / snapshotRetention temporal activities run on the
WORKFLOW_TEMPORAL_WORKER (registered in workflowsExecutor/onpremWorker) and read
snapshot blob storage via getBlobStoreForSnapshots (RR_SNAPSHOTS_* with an
RR_DEFAULT_* fallback). but gitServer.commonEnv was only injected onto the main
backend and the standalone git-server pod, so the worker had no RR_DEFAULT_*
and snapshot blob access failed there -- forcing operators to hand-plumb
RR_SNAPSHOTS_* via env.

include gitServer.commonEnv on the worker when rr.gitServer.enabled (no
git-server host/port split -- the worker is a blob-storage client, not the git
server). self-guards on rr.blobStorage being set, so it no-ops otherwise.
@JatinNanda JatinNanda force-pushed the fix-rr-agent-sandbox-secret-overload branch from a5890ea to f3e4341 Compare June 13, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants