fix(rr): decouple agentSandbox postgres from externalSecret (use postgres.urlSecretName); require encryption key#331
Conversation
|
| Filename | Overview |
|---|---|
| charts/retool/templates/_helpers.tpl | Core fix: removes externalSecret.name from $explicitPg check, adds encryptionKey require-guard, removes the postgres-url branch from postgresUrlEnv, adds passwordSecretName support in the urlSecretName path — logic is consistent and validation messages are accurate. |
| charts/retool/templates/_workers.tpl | Adds retool.gitServer.commonEnv (blob-storage vars) to the worker deployment template when gitServer.enabled; applied to all worker types — harmless for non-workflow workers and consistent with how backendEnvVars is already injected. |
| charts/retool/values.yaml | Removes Option 4 (externalSecret postgres-url), marks encryptionKey as REQUIRED, adds clarifying comments for urlSecretName/passwordSecretName; accurate and consistent with template changes. |
| charts/retool/ci/test-agent-sandbox-enabled-option.yaml | Updates the CI fixture from the old Option 4 path to Option 3 (urlSecretName pointing at the same secret as externalSecret.name); correctly exercises the new behavior. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[agentSandbox.enabled] --> V[validateSecrets]
V --> PG{postgres explicit?}
PG -- "url / host / urlSecretName set" --> PG_OK[explicit path OK]
PG -- none set --> INHERIT{config.postgresql has host?}
INHERIT -- yes --> PASS_OK[inherit backend PG]
INHERIT -- no --> FAIL1[fail: set postgres.url / .host / .urlSecretName]
V --> ENC{encryptionKey or externalSecret.name?}
ENC -- yes --> ENC_OK[encryption key present]
ENC -- no --> FAIL2[fail: set encryptionKey or externalSecret.name]
V --> JWT{jwtPublicKey or externalSecret.name?}
JWT -- yes --> JWT_OK[JWT keys present]
JWT -- no --> FAIL3[fail: set jwtPublicKey]
PG_OK --> ENV[postgresUrlEnv]
PASS_OK --> ENV
ENV --> OPT1[postgres.url - value literal]
ENV --> OPT2[postgres.host - assembled URL + PGPASSWORD]
ENV --> OPT3[postgres.urlSecretName - secretKeyRef + optional PGPASSWORD]
ENV --> OPT4[default - inherit backend config.postgresql]
GS{rr.gitServer.enabled?} -- yes --> BLOBENV[gitServer.commonEnv RR_DEFAULT_* blob-storage vars]
BLOBENV --> W[all worker pods]
GS -- no --> W
Reviews (4): Last reviewed commit: "chore(rr): bump chart version to 6.11.3 ..." | Re-trigger Greptile
7fb7343 to
78fc2ad
Compare
e70c5cf to
bd0f3a3
Compare
|
uhhh what do you think about instead of |
…s; require its encryption key setting rr.agentSandbox.externalSecret.name (intended for the JWT keypair) silently (a) routed the agent sandbox postgres connection to that secret's postgres-url key and (b) coupled the encryption key to it. an existing deployment that just wanted to reuse its backend postgres crashed at runtime with getaddrinfo ENOTFOUND. - postgres now only reads from externalSecret when postgres.fromExternalSecret: true (default false); otherwise it inherits the backend's config.postgresql connection, even when externalSecret.name is set for the JWT - require the agent sandbox encryption key (validateSecrets, mirroring the JWT keys): the proxy derives the sandbox-iframe asset-token HMAC key from AGENT_SANDBOX_ENCRYPTION_KEY and throws when serving a sandbox without it (agent_executor/proxy/src/config.ts getAssetTokenHmacSecret), so 'optional' would surface as a green pod that fails at first sandbox use. fail loudly at render instead. - updated both validateSecrets fail hints to mention postgres.fromExternalSecret: true - values.yaml (both copies): add postgres.fromExternalSecret, mark encryptionKey required, note 64 hex - bump chart 6.11.1 -> 6.11.2; update ci fixtures to supply the now-required encryption key and add fromExternalSecret: true to the Option-4 sandbox fixture
…ternalSecret (cherry picked from commit 38304c0)
…cretName the fromExternalSecret boolean was redundant with the existing Option-3 postgres.urlSecretName/urlSecretKey: 'read postgres-url from externalSecret.name' is identical to pointing urlSecretName at that same secret (urlSecretKey already defaults to postgres-url). drop the bespoke flag and the externalSecret postgres branch entirely, so externalSecret.name covers ONLY the JWT/encryption keys and never sources postgres. - remove rr.agentSandbox.postgres.fromExternalSecret (both values.yaml copies) - postgresUrlEnv: delete the externalSecret postgres branch; move the passwordSecretName PGPASSWORD support (#332) into the urlSecretName branch, so 'DSN from a secret + auto-rotated password from another secret' uses Option 3 - validateSecrets: drop the externalSecret term from $explicitPg; repoint both fail hints to postgres.urlSecretName - ci: test-agent-sandbox-enabled-option uses postgres.urlSecretName (Option 3) pointed at its JWT secret instead of the removed flag
the agentExecutor / snapshotRetention temporal activities run on the WORKFLOW_TEMPORAL_WORKER (registered in workflowsExecutor/onpremWorker) and read snapshot blob storage via getBlobStoreForSnapshots (RR_SNAPSHOTS_* with an RR_DEFAULT_* fallback). but gitServer.commonEnv was only injected onto the main backend and the standalone git-server pod, so the worker had no RR_DEFAULT_* and snapshot blob access failed there -- forcing operators to hand-plumb RR_SNAPSHOTS_* via env. include gitServer.commonEnv on the worker when rr.gitServer.enabled (no git-server host/port split -- the worker is a blob-storage client, not the git server). self-guards on rr.blobStorage being set, so it no-ops otherwise.
a5890ea to
f3e4341
Compare
what
setting
rr.agentSandbox.externalSecret.name(intended just for the JWT keypair) silently overloaded two unrelated things, breaking enablement on a deployment that only wanted to reuse its existing backend postgres:postgresUrlEnvroutedAGENT_SANDBOX_POSTGRES_URLto the external secret'spostgres-urlkey. missing key →getaddrinfo ENOTFOUNDat runtime. the inherit-the-backend path was unreachable wheneverexternalSecret.namewas set, andvalidateSecretscounted it as an explicit pg source so nothing failed at template time.AGENT_SANDBOX_ENCRYPTION_KEY(agent_executor/proxy/src/config.tsgetAssetTokenHmacSecret) and throws the first time it serves a sandbox if it's missing. the pod comes up green and then fails at first use.separately, the snapshot blob-storage env never reached the workflow worker, even though the agent/snapshot temporal activities run there.
changes
externalSecret.namenow covers only the JWT/encryption keys — it never sources postgres. to read a DSN from that same secret, pointpostgres.urlSecretNameat it (Option 3;urlSecretKeydefaults topostgres-url). otherwise postgres inheritsconfig.postgresql— an existing deployment needs no DB config.validateSecrets(mirrors the JWT-key checks) — fail loudly at render instead of shipping a pod that breaks at first sandbox use; both fail hints point atpostgres.urlSecretNamegitServer.commonEnv(RR_DEFAULT_*) onto the workflow worker whenrr.gitServer.enabled. theagentExecutor/snapshotRetentiontemporal activities run onWORKFLOW_TEMPORAL_WORKERand read snapshot blob storage (getBlobStoreForSnapshots,RR_SNAPSHOTS_*→RR_DEFAULT_*fallback), butcommonEnvwas only injected onto the backend + standalone git-server pod — so the worker had no creds and snapshot access failed there.values.yaml(both copies): droppostgres-urlfrom theexternalSecret.namekey list, markencryptionKeyrequired (64 hex,openssl rand -hex 32)ci/fixtures to supply the now-required encryption keyincorporates #332
pulled in @ryanartecona's
passwordSecretNamesupport so the two don't collide. in the final shape it lives in thepostgres.urlSecretNamebranch: the blueprints case (DSN from the agent-sandbox secret, password from the auto-rotated main secret) ispostgres.urlSecretName: agent-sandbox-env+postgres.passwordSecretName: retool-config.testing
helm templateverified:urlSecretNameroutes to the secret'spostgres-url(+PGPASSWORDwhenpasswordSecretNameset); JWT-onlyexternalSecretinherits the backend DSN; render fails with a clear error when no encryption-key source is set; the workflow worker now rendersRR_DEFAULT_*whengitServer.enabled+blobStorageis set. allci/fixtures render;helm lintclean.note: same-origin-on-HTTPS (a separate reported issue) is a backend gap, not chart-fixable, and is out of scope here.
🤖 Generated with Claude Code