Merge r2 → main: R² (React Retool) self-hosted support (chart 6.12.0) by JatinNanda · Pull Request #324 · tryretool/retool-helm

JatinNanda · 2026-06-11T17:27:13Z

What

Lands the full R² (React Retool) self-hosted feature set from the long-lived r2 branch onto the published main chart line. Chart version 6.10.6 → 6.12.0.

This is a real merge commit (not a squash) so the 33-commit R² history is preserved — see the Commits tab.

What's included

rr.* values layout ([rr] Restructure into top-level rr.enabled master switch + rename r2 → rr #321) — one top-level rr.enabled master switch over rr.jsExecutor / rr.agent / rr.agentSandbox, with null-inherit per-component overrides. Fail-loud guard (retool.rr.validateLegacyValues) on the old r2.* / top-level keys.
js-executor workload ([INF-6675] add js executor #275), self-contained env, Guaranteed QoS.
rr.agent server-side agent worker.
agent-sandbox service (support agent-sandbox service #281, agent sandbox secret overrides #290, agent-sandbox: validate required secrets + existing-secret DSN ref #308): controller + proxy + ephemeral Job sandboxes, flexible Postgres sourcing (inherit backend by default), flexible secret sourcing (JWT required, encryption/api optional), same-origin proxy by default (no extra ingress), NetworkPolicy, device plugin.
git server ([feat][r2] enable git_server in-process with rrGitServer.enabled + blobStorage config #296) in-process by default; optional split deployment ([feat][r2] optionally split rrGitServer into its own deployment #309).
shared blob storage block (s3/gcs/azure) + raw-env escape hatch.
MCP server (Add MCP server support to Retool Helm chart #285), top-level and opt-in (not part of the master switch, [fix][r2] drop mcp from the r2.enabled master switch #316).

Conflict resolution

charts/retool/Chart.yaml version → 6.12.0 (the R² release; subsumes main's 6.10.6 from [fix] update startupProbe.successThreshold example to be 1 #305).
main's [fix] update startupProbe.successThreshold example to be 1 #305 startupProbe.successThreshold example and r2's temporal dependency-condition change are both preserved (auto-merged).

Prerequisites (done)

Consumer migrations to the (non-backwards-compatible) rr.* layout landed before this merge:

retool-k8s #18177 — internal-onprem + admin + managed-self-hosted
retool-k8s #18178 — nuon

Validation (on the merged tree)

helm lint clean
All 19 ci/*option.yaml permutations render
Legacy-values guard fires on stale r2.* keys

Not in this PR

Publishing to charts.retool.com (6.12.0) — deliberate separate rollout step.
retool-self-hosted-blueprints rr.* migration — still to do before that repo consumes the published chart (it currently pins ref=r2, so unaffected by this merge).

⚠️ Merge note: repo is configured squash-only, which would flatten the 33 commits. To preserve history on main, enable merge commits before merging this PR.

🤖 Generated with Claude Code

Made-with: Cursor

Adds optional MCP server support to the Retool Helm chart, disabled by default. Main changes: - Adds a new mcp values block in charts/retool/values.yaml and root values.yaml. - Adds a standalone MCP Service, Deployment, and optional PodDisruptionBudget. - Runs MCP using the backend image with SERVICE_TYPE=MCP_SERVER. - Supports MCP configuration for replicas, resources, env vars, toolsets, transport/session limits, service ports, affinity, node selectors, and tolerations. - Routes /mcp and /.well-known/oauth-protected-resource to the MCP service through both Ingress and HTTPRoute. - Adds MCP helper labels/naming in _helpers.tpl. - Adds CI render coverage via test-mcp-enabled-option.yaml. Validation performed: - Helm template render with MCP disabled - Helm template render with MCP enabled - Helm lint with MCP enabled - kubeconform validation during earlier verification

* increase mem * update file

This reverts commit 98ecae0.

This reverts commit f741066.

* make agentSandbox.image.tag non-required * Make agentSandbox.devicePlugin.priorityClassName configurable for GKE support * try adding ingress support for agentsandbox proxy url * disable apparmor in sandbox jobs for gke/aks support * try adding httproute support for r2 agent-proxy * trim whitespace

…obStorage config (#296) * [chore][r2] add RR_GIT_SERVER to main backend's default SERVICE_TYPE Pairs with retool_development's RR_GIT_SERVER scaffold (commit 68162710ee0 on jatin/git-server-scaffold). The git-server runs in-process alongside MAIN_BACKEND rather than as a split-out deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [feat][r2] gate RR_GIT_SERVER on rrGitServer.enabled and add blobStorage config git_server needs an object store for repo blobs/packs (and snapshots use the same backend abstraction). The earlier commit unconditionally appended RR_GIT_SERVER to SERVICE_TYPE, which would have main backend crash at runtime on the first git op when blob storage isn't configured. Adds: - rrGitServer.enabled (default false) — gates the SERVICE_TYPE append - blobStorage block with s3 / gcs / azure sub-blocks (set exactly one) - {{ fail }} guard requiring exactly one provider when rrGitServer.enabled - Renders RR_BLOB_STORAGE_PROVIDER + RR_DEFAULT_<PROVIDER>_* env vars on the main backend deployment, with secretKeyRef support for the secret (S3 secret access key, Azure connection string, GCS credentials) - Optional rrGitServer.repackThreshold -> RR_GIT_REPACK_THRESHOLD blobStorage is a top-level block (not nested under rrGitServer) because the backend's RR_DEFAULT_* vars are shared with snapshots; this same config will feed them once they get wired up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [refactor][r2] extract rrGitServer blob storage provider check to a helper Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [chore][r2] allow blobStorage opt-out via direct env vars The rrGitServer.enabled fail-fast was blocking customers who'd rather plumb RR_BLOB_STORAGE_PROVIDER / RR_DEFAULT_*_* in directly via environmentVariables / environmentSecrets. Mirror the mcp pattern of detecting the env var and skipping the guard when present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [chore] sync top-level values.yaml with charts/retool/values.yaml Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…CP metadata (#298) Adds `mcp.config.oauthMainDomain`, which renders `OAUTH_MAIN_DOMAIN` into the MCP deployment for OAuth metadata base URL configuration. Documents the new MCP OAuth domain configuration in both chart values files. Updates the MCP render fixture so Helm rendering exercises the new environment variable. Validated with Helm rendering and linting.

* Rename sandbox env vars Also remove stale unused env vars & update job resource requests * fix sandbox job template commas --------- Co-authored-by: Ryan Artecona <ryanartecona@gmail.com>

…l-k8s (#310) * [fix[R2] Increase the AE proxy timeout to be inline with fix in retool-k8s * Update charts/retool/values.yaml Co-authored-by: Ryan Artecona <ryanartecona@gmail.com> * Update values.yaml Co-authored-by: Ryan Artecona <ryanartecona@gmail.com> * lint fix --------- Co-authored-by: Ryan Artecona <ryanartecona@gmail.com>

…xy, secrets, git-server split) (#315) * js-executor: drop backend-shared env inheritance + resize resources (#304) * js-executor: stop inheriting backend-shared env The js-executor deployment looped over the backend-shared .Values.env and .Values.environmentSecrets (and .Values.environmentVariables) unfiltered, injecting db creds, auth/encryption secrets, license key, and other backend config into a pod that needs none of it. This pollutes the workload and widens the blast radius of any change to shared env. js-executor is a standalone nsjail JS sandbox that reads none of the backend-shared env vars. Replace the inheritance with per-workload overrides: jsExecutor.env / jsExecutor.environmentSecrets / jsExecutor.environmentVariables (all default empty), matching the self-contained pattern already used by the mcp and agent_sandbox workloads. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * js-executor: bump CPU to 6000m, set memory 6Gi Bump js-executor CPU rather than shrinking memory. Set requests == limits at cpu: 6000m / memory: 6Gi (Guaranteed QoS). The memory request is kept equal to the limit because JSE reads its memory limit and rejects requests at 80% of it, so the request must reserve the full amount. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * rrGitServer: accept blob-storage env vars from .Values.env (#307) * rrGitServer: also accept blob-storage env vars from .Values.env validateBlobStorage only scanned environmentVariables and environmentSecrets for RR_BLOB_STORAGE_PROVIDER / RR_DEFAULT_*, so deployments that configure those via the .Values.env map had to duplicate them into environmentVariables to satisfy the check. Range over .Values.env (keyed by var name) as well, and mention env in the doc comment and failure message. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * rrGitServer: add skipBlobStorageValidation escape hatch The blob-storage guard can only inspect blobStorage / env / environmentVariables / environmentSecrets at template time. Env vars injected via envFrom (a Secret/ConfigMap splat) are invisible to it, so a valid configuration that supplies RR_BLOB_STORAGE_PROVIDER / RR_DEFAULT_* that way would fail the check with no way out. Add rrGitServer.skipBlobStorageValidation (default false) to bypass the check entirely, and point at it from the failure message. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Document self-hosted same-origin agent-sandbox proxy (no extra ingress) (#302) Clarify that leaving agentSandbox.frontendWsProxyDomain empty makes the backend serve the sandbox proxy same-origin via the main ingress, so no dedicated proxy domain or ingress object is required for self-hosted. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * re-sync values.yaml with chart copy after #302 PR #302 updated the agentSandbox.frontendWsProxyDomain comment in charts/retool/values.yaml but not the mirrored root values.yaml, leaving the two out of sync (and failing the values-yaml-synced check on PRs targeting this branch). Copy the richer comment into the root values.yaml. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [feat][r2] optionally split rrGitServer into its own deployment (#309) Adds rrGitServer.separate.enabled to run the git server as a dedicated deployment + service instead of in-process on the main backend, mirroring how the workload is split in Retool Cloud (reached via normal k8s service discovery). When enabled: - a dedicated <release>-git-server Deployment runs SERVICE_TYPE=RR_GIT_SERVER on RR_GIT_SERVER_PORT, with the Postgres connection, bootstrap secrets, blob-storage env, and telemetry - the main backend drops RR_GIT_SERVER from its SERVICE_TYPE and proxies git traffic to the service via RR_GIT_SERVER_HOST / RR_GIT_SERVER_PORT - the MCP server (if enabled) is auto-pointed at the service unless mcp.config.retoolGitServerUrl is set explicitly The blob-storage env block is extracted into a shared helper (retool.rrGitServer.commonEnv) so the in-process backend and the standalone deployment stay in sync. In-process mode (rrGitServer.enabled without separate) is unchanged. Adds ci/test-rr-git-server-separate-option.yaml exercising the split + S3 blob storage + MCP auto-wiring. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * agent-sandbox: validate required secrets + existing-secret DSN ref (#308) * agent-sandbox: validate required secrets, flexible Postgres DSN sourcing The agent-sandbox secret story was under-validated and rigid: - An empty postgres.url silently base64-encoded to nothing ({{ $as.postgres.url | default "" | b64enc }}), so a misconfigured deploy installed cleanly and the controller/proxy crash-looped at runtime. - jwtPublicKey / jwtPrivateKey (required for the controller/proxy to boot and for the backend to sign sandbox tokens) had no guard when absent. - Postgres could only be supplied as a plaintext DSN; operators could not reuse an existing password-only secret (e.g. the backend's Postgres password). The agent-sandbox app consumes a single connection string (no split-field code path), so the chart now offers four ways to supply it, validated at install: 1. postgres.url -- plaintext DSN. 2. postgres.host (+ user + database) -- the chart assembles postgres://user@host:port/database and supplies the password out-of-band via the PGPASSWORD env var, from postgres.password or postgres.passwordSecretName. node-postgres reads PGPASSWORD when the DSN omits the password, so the password needs no URL escaping -- any characters are safe. This is what lets a password-only secret be reused. 3. postgres.urlSecretName -- existing secret holding the full DSN. 4. externalSecret.name -- catch-all secret, postgres-url key. user/database are embedded in the assembled DSN verbatim. Percent-encoding does not round-trip here (pg-connection-string decodes userinfo before splitting on ':' and runs the path through decodeURI), so validateSecrets instead rejects the characters that would break parsing -- ':' '/' '?' '#' / whitespace in user and '?' '#' / whitespace in database. '@' is allowed (Azure-style user@servername parses correctly, splitting on the last '@'); for other characters use options 1 or 3. Other changes: - Add retool.agentSandbox.validateSecrets: fail at install time when an enabled workload is missing a Postgres source, user/database for the assemble path, a JWT public key, or a JWT private key, or has unsafe characters in user/database. - Promote the controller/proxy URL block to retool.agentSandbox.postgresUrlEnv. - Only write postgres-url into the chart-managed secret when a plaintext url is set, so empty keys are never emitted. - Document the canonical shapes and the password-secret reuse path. Audit: mcp already fails on its missing required secret; js_executor has no secrets, so neither needs changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * agent-sandbox: inherit backend Postgres connection by default Enabling the agent sandbox on an existing deployment previously meant re-entering the Postgres host/database/user (and pointing at the password) under agentSandbox.postgres, even though the sandbox lives in the same database as the backend, just under a separate schema. Add inheritance as the default: when none of agentSandbox.postgres.url / .host / .urlSecretName / agentSandbox.externalSecret.name is set, the chart assembles the DSN from the backend's connection (config.postgresql or the postgresql subchart, via the retool.postgresql.* helpers) and sources PGPASSWORD from the same secret the backend uses (mirrors POSTGRES_PASSWORD in deployment_backend.yaml). So enabling r2 against the existing database needs no new Postgres values; the schema stays separate (postgres.schema, default agent_executor). Any explicit option still overrides. validateSecrets gates the one combination inheritance can't reach: when the backend password is supplied via external secrets (envFrom) with no discrete key, it fails with guidance to set an explicit option. The assembled URL defaults the port to 5432 when config.postgresql.port is unset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * agent-sandbox: fix stale Option 4 postgres comment After adding default inheritance, "leave options 1-3 blank" no longer selects Option 4 -- it selects the default (inherit config.postgresql). Clarify that Option 4 is chosen by setting externalSecret.name (in the Secrets section), and that leaving options 1-4 all unset falls through to inheritance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * agent-sandbox: guard host-assembly path with no password source When postgres.host was set without postgres.password or postgres.passwordSecretName, postgresUrlEnv emitted a DSN with no password and no PGPASSWORD, so the misconfiguration only surfaced at runtime. validateSecrets now fails at install in that case, pointing to postgres.url / urlSecretName for intentionally passwordless setups (IAM/trust auth). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: test coverage for r2 workloads (js-executor, agent-sandbox, r2Agent) (#312) * ci: add test values for agent-sandbox and js-executor workloads The R2 js-executor and agent-sandbox workloads had no CI test values, so a values change could break their templates silently. Only agents and mcp were covered under charts/retool/ci/. Add test-js-executor-enabled-option.yaml and test-agent-sandbox-enabled-option.yaml enabling each workload with realistic config. These are auto-discovered by .github/kubeconform.sh (find -name '*option.yaml') and overlaid on every base values file across the kubeconform matrix — no workflow change needed. Both pass helm template + kubeconform against all base values files on k8s 1.27.16 through 1.31.6. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: expand r2 workload coverage (secret/postgres matrix, ingress modes, r2Agent) Rebased onto the latest r2-cleanup, which merged #308 (agent-sandbox validateSecrets + flexible Postgres sourcing) and #309 (split rrGitServer). Adds test values exercising the full new surface: agent-sandbox — one option file per secret/Postgres precedence path so every branch of postgresUrlEnv/validateSecrets is templated: - existing externalSecret.name file → Postgres option 4 + dedicated proxy domain WITH ingress + TLS + networkPolicy + device plugin + both PDBs - inline secrets (chart-rendered Secret) + plaintext DSN (option 1) + same-origin proxy / NO ingress + hostPath /dev/net/tun (devicePlugin off) - assemble DSN from fields + PGPASSWORD secretKeyRef (option 2), Azure-style user@server username, external device-manager (deployDaemonSet off) - full DSN from an existing Secret via urlSecretName (option 3) - zero-config inherit of the backend Postgres connection (option 5) r2Agent — new worker (R2_AGENT_TEMPORAL_WORKER, port 3016) Deployment/Service/PDB. js-executor — add environmentSecrets to cover the per-workload secretKeyRef branch. All ci/*option.yaml validate via helm template + kubeconform against all three base values files on k8s 1.27.16 and 1.31.6 (108 combinations). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: honor jsExecutor.image.pullPolicy in js-executor deployment deployment_js_executor.yaml read the global .Values.image.pullPolicy, so the per-workload jsExecutor.image.pullPolicy knob (present in values.yaml) was dead. This was inconsistent with the js-executor image *tag* (per-workload via the retool.jsExecutor.image.tag helper) and with agent-sandbox (reads $as.image.pullPolicy). Read jsExecutor.image.pullPolicy with a fallback to the global value. The js-executor CI test now sets pullPolicy: Always (differs from the global IfNotPresent) so a regression back to the global value is caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [feat][r2] add single r2.enabled master switch for R2 components (#313) * [feat][r2] add single r2.enabled master switch for R2 components Turning on the R2 stack previously meant flipping four independent flags (r2Agent, jsExecutor, agentSandbox, mcp). Add a top-level `r2.enabled` master switch that toggles all four collectively, with room for shared R2 config later. Semantics: inherit + override. Each component's `enabled` default changes from false to null; when null it inherits `r2.enabled`, and an explicit true/false on the component overrides the master for that component only. Backward compatible: existing configs that set the per-component flags explicitly behave identically. Add generic helper `retool.r2.componentEnabled`; `retool.r2Agent.enabled` delegates to it. Every read of these flags is routed through the helper -- not just the deployment guards but the cross-component env wiring in backend/workflows/jobs/_workers and the agentSandbox validate/backendEnv/ httproute helpers -- so an inherited (null) flag still drives JS_EXECUTOR and AGENT_SANDBOX env injection instead of reading as false. Add ci/test-r2-enabled-option.yaml covering the master-switch inherit path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [r2] update MCP oauth-token fail message for inherited enablement The error still said "when .Values.mcp.enabled is true", which misleads operators who enable MCP via the new master switch (r2.enabled: true) and leave mcp.enabled null. Reword to cover both the explicit flag and inheritance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * agent-sandbox: reject ':' and '/' in postgres.database DSN assembly The host-fields DSN path assembles postgres://user@host:port/database via printf, and validateSecrets guards the embedded user/database against characters that break URL parsing. The user check rejected [\s:/?#] but the database check only rejected [\s?#], so a database name containing '/' (e.g. 'my/db') silently produced postgres://user@host:5432/my/db -- which pg URL parsers read as database 'my' with a trailing path, connecting to the wrong database. Align the database check with the user check ([\s:/?#]); affected names must instead supply a full DSN via postgres.url / postgres.urlSecretName. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mcp requires an OAuth introspection token to template (oauthIntrospectionAuthToken / secret / env), unlike the other R2 components. Having mcp inherit the r2.enabled master switch meant `r2.enabled: true` hard-failed out of the box ("Please set ...oauthIntrospectionAuthToken... when the MCP server is enabled") unless the user also configured mcp — defeating the one-line enable. Make mcp independent: mcp.enabled defaults to false and is read directly (deployment_mcp.yaml gates on .Values.mcp.enabled), so the master switch governs only r2Agent/jsExecutor/agentSandbox. mcp stays opt-in via mcp.enabled: true. Update the componentEnabled doc, the OAuth fail message, and the test-r2-enabled-option fixture (mcp must no longer render from r2.enabled alone). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ifetimeMs (#317) Wire up controller.scaling.perUserSandboxLimit config option (default 5) and sandbox.sandboxGlobalLifetimeMs (default 2.5 hrs). Remove environment variables that are no longer used: SLOTS_PER_POD, EXECUTOR_{MIN,MAX}_REPLICAS, SCALE_{UP,DOWN}_THRESHOLD, SCALE_DOWN_GRACE_PERIOD_MS.

retool-k8s (helm/retool-workflow-jail/files/nsjail-seccomp.json) is the source of truth for the nsjail seccomp profile. The public chart copy had drifted in its `socket` syscall family rules; this re-syncs it verbatim so the public jsExecutor/codeExecutor sandbox matches what we run internally. Co-authored-by: Cursor <cursoragent@cursor.com>

* Set appArmorProfile Unconfined for js-executor nsjail (used by js-executor to sandbox user code) remounts the rootfs and sets up its mount namespace at startup. On nodes where the container runtime attaches an AppArmor profile to non-privileged containers — e.g. GKE Container-Optimized OS, where containerd applies cri-containerd.apparmor.d with `deny mount` — that mount is rejected with EPERM and the sandbox fails to launch. EKS (Amazon Linux 2023) uses SELinux and attaches no AppArmor profile, so this never surfaced there. Run js-executor with appArmorProfile Unconfined so nsjail can set up its sandbox, mirroring the existing agent-sandbox container. The Localhost seccomp profile continues to provide syscall-level isolation. Co-authored-by: Cursor <cursoragent@cursor.com> * Remove explanatory comment from js-executor appArmorProfile Co-authored-by: Cursor <cursoragent@cursor.com> * Use AppArmor annotation instead of securityContext field for js-executor The appArmorProfile securityContext field only exists in the Kubernetes API from v1.30+, so strict kubeconform validation against v1.27-v1.29 rejected it with "additionalProperties 'appArmorProfile' not allowed". Switch to the container.apparmor.security.beta.kubernetes.io/<container> pod annotation, which is honored across all supported Kubernetes versions and is not subject to schema validation. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

…deExecutor.useSeccompProfile) (#311) * Run code-executor unprivileged with seccomp on k8s >= 1.33 On Kubernetes 1.33+ (where the ProcMountType and UserNamespacesSupport feature gates are on by default), the code-executor now runs unprivileged using a localhost seccomp profile, NET_ADMIN, an unmasked /proc, and user namespaces - mirroring how the JS executor sandboxes itself. The nsjail seccomp profile is installed onto the node by an install-seccomp init container. On older clusters it falls back to the existing privileged mode, so the chart still installs without requiring 1.33+. Setting codeExecutor.securityContext explicitly continues to override this behavior for either mode. Co-authored-by: Cursor <cursoragent@cursor.com> * Keep root values.yaml in sync with charts/retool/values.yaml Co-authored-by: Cursor <cursoragent@cursor.com> * Drop codeExecutor securityContext comments Co-authored-by: Cursor <cursoragent@cursor.com> * Document why code-executor uses seccomp on k8s 1.33+ Co-authored-by: Cursor <cursoragent@cursor.com> * Note 1.33+ upgrade for fine-grained privileges Co-authored-by: Cursor <cursoragent@cursor.com> * Gate code-executor seccomp behind codeExecutor.useSeccompProfile flag Replace the automatic k8s >= 1.33 version detection with an explicit opt-in flag (codeExecutor.useSeccompProfile, default false). The chart defaults to the existing privileged mode and only renders the unprivileged seccomp path (seccomp profile + NET_ADMIN + unmasked /proc + user namespaces + install-seccomp init container) when the operator sets the flag. An explicitly pinned codeExecutor.securityContext still wins. Enabling the flag requires Kubernetes 1.33+ (ProcMountType and UserNamespacesSupport feature gates); this is now the operator's responsibility rather than auto-detected. Co-authored-by: Cursor <cursoragent@cursor.com> * Consolidate seccomp docs into values.yaml comment Move the detailed rationale for codeExecutor.useSeccompProfile into the values.yaml comment (operator-facing) and reduce the template comment to a short pointer explaining the $useSecComp local. Co-authored-by: Cursor <cursoragent@cursor.com> * Set AppArmor unconfined for code-executor seccomp path When codeExecutor.useSeccompProfile drops the privileged securityContext, the container is run under the container runtime's default AppArmor profile on AppArmor-enabled nodes (e.g. GKE Container-Optimized OS, where containerd attaches cri-containerd.apparmor.d with `deny mount`). nsjail remounts the rootfs and /proc to build its sandbox, so that profile rejects the mounts with EPERM and code-executor crash-loops. Privileged containers were unaffected because AppArmor confinement is not applied to them. Add the container.apparmor.security.beta.kubernetes.io/code-executor: unconfined pod annotation, gated to the same $useSecComp path as the seccomp profile, hostUsers and procMount changes. The annotation is honored across all supported Kubernetes versions (unlike the appArmorProfile field, which is v1.30+) and is not subject to strict schema validation. The Localhost seccomp profile continues to provide syscall isolation. Co-authored-by: Cursor <cursoragent@cursor.com> * Reword useSeccompProfile gate comment to drop operator phrasing Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

Consolidate the RR (formerly "r2") stack into a single top-level `rr:` block whose `rr.enabled` is the master switch, with every component RR needs nested directly under it: rr: enabled: false jsExecutor: {...} # inherits rr.enabled agent: {...} # RR server-side agent worker — inherits rr.enabled agentSandbox: {...} # inherits rr.enabled gitServer: {...} # required for rr blobStorage: {...} # required for rr The vocabulary is renamed r2 -> rr to match the RR_ env vars, and the nested keys carry no redundant prefix (the `rr:` namespace scopes them) — the full path composes to the env var (rr.gitServer -> RR_GIT_SERVER, rr.blobStorage -> RR_BLOB_STORAGE), and rr.gitServer matches the rendered `-git-server` resource name. mcp and the separate AI-`agents` feature stay top-level (mcp is intentionally independent of the master switch). Helpers are retool.rr.* (componentEnabled, validateLegacyValues) and retool.gitServer.* / retool.agent.enabled. Intentionally NOT renamed, so this is a no-op for running pods (no resource recreation / no backend contract break): - SERVICE_TYPE=R2_AGENT_TEMPORAL_WORKER, temporal taskqueue r2-agent, and the r2-agent-worker resource + telemetry name. - the agent's internal worker identity: worker `type: rrAgent` and the retool.rrAgentWorker.* helpers, kept distinct from the AI-`agents` worker's retool.agentWorker.* to avoid a collision. Only the user-facing value key (rr.agent) and its enable helper are de-prefixed. - the unrelated "Cloudflare R2" mention in the blob-storage example. Robustness: - retool.rr.componentEnabled is kind-aware: an absent/null component block is disabled (no config to render); a map uses its `enabled` (inheriting the master switch when unset); a non-mapping value (e.g. a bare bool) fails loudly with guidance. Fixes the nil-dereference on an explicitly-nulled component and avoids relocating the crash into the deployment templates. - retool.rr.validateLegacyValues catches BOTH old top-level keys (the `r2:` master switch and the un-nested components) AND old leaf names left under the new `rr:` block (rr.r2Agent/rrAgent/rrGitServer/rrBlobStorage), mapping each to its new path. helm template/upgrade fails loudly rather than silently disabling RR. - the nested worker's values owner is resolved from a declarative `nested: rr` field on the worker descriptor instead of a hardcoded parent-name match. Verified: rendered manifests are byte-identical to the original r2 branch across all six scenarios (only the random postgres-password differs); helm lint clean; all 10 RR CI overlays render; both values.yaml copies kept byte-identical. Renamed test overlays test-r2-*-option.yaml -> test-rr-*-option.yaml. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…_MS (#322) The agent-executor sandbox connect timeout (config.ts readyTimeoutMs) is now env-configurable via SANDBOX_READY_TIMEOUT_MS, but the chart never set it, so the job-template fell back to the image default (20s). Interactive sandbox boot (gVisor + bundle load) can exceed that, surfacing "did not connect within 20000ms". Add an agentSandbox.sandbox.sandboxReadyTimeoutMs knob (default 20000, matching the code default) and emit SANDBOX_READY_TIMEOUT_MS in the job-template env next to SANDBOX_IDLE_TIMEOUT_MS / SANDBOX_GLOBAL_LIFETIME_MS, so operators can raise it (e.g. 45000) without manual job-template patching. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The pre-rename (`r2.*` / top-level component) values guard added in #321 already fails loud, but the message buried the call to action. Lead with "ACTION REQUIRED: update your Helm values file", state the deploy is blocked, and give an explicit "edit your values file and rename these keys" instruction before the key-move list. Message-only change inside the existing fail string: no values.yaml or CI changes. Verified the guard still fires (now with the clearer text) on a legacy key and stays silent on a clean rr.* render. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Brings the full R² stack to the published chart line (6.10.6 → 6.12.0): the rr.* values layout (#321 master switch), js-executor, rr.agent, agent-sandbox (controller/proxy/sandbox + flexible Postgres & secrets), in-process / split git server, shared blob storage, and the MCP server. Conflict: charts/retool/Chart.yaml version — resolved to 6.12.0 (the R² release; subsumes main's 6.10.6). main's #305 startupProbe.successThreshold example and r2's temporal dependency-condition change both preserved. Prerequisite consumer migrations landed first: retool-k8s #18177 (internal-onprem + admin + MSH) and #18178 (nuon). The rr.* layout is fail-loud / non-backwards-compatible by design (retool.rr.validateLegacyValues). Validated on the merged tree: helm lint clean, all 19 ci/*option.yaml permutations render, legacy-values guard fires. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-06-11T17:33:30Z

Greptile Summary

This PR lands the full R² (React Retool) self-hosted feature set from the long-lived r2 branch, bumping the chart from 6.10.6 to 6.12.0. It adds five new workloads (JS executor, RR agent worker, agent-sandbox controller + proxy + Job-based sandboxes, optional split git server, MCP server) under a unified rr.* top-level values hierarchy with a single master switch and per-component overrides, plus fail-loud guards for stale r2.* / legacy key names.

New deployments: deployment_js_executor.yaml, deployment_mcp.yaml, deployment_git_server.yaml, and deployment_agent_sandbox.yaml (controller + proxy + Job template ConfigMap + RBAC + Services + optional Ingress + PDB).
Cross-cutting wiring: deployment_backend.yaml, deployment_workflows.yaml, deployment_jobs.yaml, and _workers.tpl all gain conditional JS-executor domain, agent-sandbox backend env vars, and git-server SERVICE_TYPE injection; the main Service gains an optional second port for MCP OAuth metadata paths.
Supporting resources: seccomp DaemonSet, image-prepuller DaemonSet, device-plugin DaemonSet, NetworkPolicy set, and shared blob-storage env rendering for the git server.

Confidence Score: 4/5

Safe to merge; all new workloads are gated behind opt-in switches so existing deployments are unaffected.

The core logic is carefully crafted and well-tested across 19 CI permutations. The two MCP template oversights (missing standard pod labels, wrong extraConfigMapMounts placement) only affect MCP adopters who rely on standard chart labels or use extraContainers alongside extraConfigMapMounts. No previously-working paths are regressed.

charts/retool/templates/deployment_mcp.yaml — pod template labels and extraConfigMapMounts ordering should be reviewed before MCP is widely adopted.

Important Files Changed

Filename	Overview
charts/retool/templates/deployment_mcp.yaml	New MCP deployment template with oauthIntrospectionAuthToken guard. Two issues: pod template labels omit the standard retool.labels include present in every other template, and extraConfigMapMounts volume mounts are rendered after extraContainers, which mis-parents them when extra containers are used.
charts/retool/templates/deployment_agent_sandbox.yaml	New 714-line file rendering the full agent-sandbox stack (Secret, RBAC, job-template ConfigMap, controller/proxy Deployments and Services, proxy Ingress, headless sandbox Service, PDB). JWT key in the job-template JSON now correctly uses toJson.
charts/retool/templates/deployment_js_executor.yaml	New JS executor deployment with seccomp init-container installer and NET_ADMIN capability. AppArmor is set via the beta annotation only; the agent-sandbox job template in the same PR already uses the native appArmorProfile field.
charts/retool/templates/_helpers.tpl	Large expansion adding RR helper templates: componentEnabled/validateLegacyValues guards, agentSandbox secret/postgres/backend-env helpers, gitServer blob-storage validation, MCP routing helpers, and updated retool.env to support map-style valueFrom entries.
charts/retool/templates/deployment_git_server.yaml	New optional split git-server deployment sharing the main backend image and credentials. Correctly waits for Postgres, carries standard retool.labels in pod template, and auto-points MCP at it when separate mode is enabled.
charts/retool/templates/agent_sandbox_networkpolicy.yaml	New NetworkPolicy set for sandbox pods, controller, and proxy. The proxy egress IPv4 rule always emits except: without a guard for an empty blockedRanges list (flagged in a previous thread).
charts/retool/values.yaml	Adds mcp, rr (jsExecutor/agent/agentSandbox/gitServer/blobStorage) top-level blocks with detailed inline documentation. The jsExecutor.volumes/volumeMounts defaults of {} (previously flagged) remain maps instead of lists.
charts/retool/templates/deployment_backend.yaml	Adds legacy-values guard, git-server SERVICE_TYPE injection, agentSandbox backend env vars, JS_EXECUTOR_INGRESS_DOMAIN, and git-server host/port split-mode env vars. All conditional and safe.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    BE["Backend (main)\nSERVICE_TYPE: MAIN,JOBS_RUNNER\n[+RR_GIT_SERVER if inline]"]
    WF["Workflows worker"]
    JOB["Jobs worker"]
    RRA["RR Agent worker\n(R2_AGENT_TEMPORAL_WORKER)"]
    JSE["JS Executor\n(tryretool/js-executor-service)"]
    MCP["MCP Server\n(SERVICE_TYPE: MCP_SERVER)"]
    GIT["Git Server (optional split)\n(SERVICE_TYPE: RR_GIT_SERVER)"]
    SC["AgentSandbox Controller"]
    SP["AgentSandbox Proxy"]
    SJ["Sandbox Jobs\n(ephemeral K8s Jobs)"]
    PG[("Postgres")]

    BE -->|"JS_EXECUTOR_INGRESS_DOMAIN"| JSE
    BE -->|"AGENT_SANDBOX_CONTROLLER_INGRESS_DOMAIN"| SC
    BE -->|"AGENT_SANDBOX_PROXY_INGRESS_DOMAIN"| SP
    BE -->|"RR_GIT_SERVER_HOST/PORT (split mode)"| GIT
    WF -->|"JS_EXECUTOR_INGRESS_DOMAIN"| JSE
    WF -->|agentSandbox env| SC
    JOB -->|agentSandbox env| SC
    RRA -->|agentSandbox env| SC
    MCP -->|"RETOOL_BACKEND_URL"| BE
    MCP -->|"RETOOL_GIT_SERVER_URL (split mode)"| GIT
    SC -->|"manages"| SJ
    SC -->|"state"| PG
    SP -->|"proxy egress"| SJ
    SP -->|"state"| PG
    GIT -->|"blob storage\n(S3/GCS/Azure)"| BS[("Blob Storage")]

_{Reviews (2): Last reviewed commit: "[agent-sandbox] fix jwtPublicKey breakin..." | Re-trigger Greptile}

greptile-apps · 2026-06-11T17:33:34Z

+                  {{- if $as.jwtPublicKey }}
+                  ,{"name": "AGENT_SANDBOX_JWT_PUBLIC_KEY", "value": "{{ $as.jwtPublicKey }}"}


JWT public key interpolated into JSON without escaping

On line 181, jwtPublicKey is embedded directly into a JSON string literal: "value": "{{ $as.jwtPublicKey }}". ES256 keys are ECDSA/P-256, commonly stored in PEM format with newline characters (\n) and -----BEGIN/END PUBLIC KEY----- headers. An unescaped newline inside a JSON string literal produces invalid JSON, causing the controller to fail when reading the job-template ConfigMap to spawn sandbox Jobs. Even a compact JWK ({"kty":"EC",...}) would embed unescaped double-quotes and break the JSON. The fix is to replace the surrounding "..." with {{ $as.jwtPublicKey | toJson }} so newlines and quotes are properly JSON-escaped.

greptile-apps · 2026-06-11T17:33:36Z

+              {{- end }}
+    {{- if $as.networkPolicy.blockedRanges6 }}
+    - to:
+        - ipBlock:
+            cidr: ::/0
+            except:
+              {{- range $as.networkPolicy.blockedRanges6 }}


except: rendered with no items when blockedRanges is cleared

The proxy egress rule always emits except: regardless of whether blockedRanges is populated. If a user explicitly sets networkPolicy.blockedRanges: [], the template renders except: with a null value, which Kubernetes rejects because the field expects a list of CIDR strings. The default values include a populated blockedRanges list so this doesn't affect typical usage, but it would silently break a user who tries to allow all egress by clearing the list. Wrapping the except: key in {{- if $as.networkPolicy.blockedRanges }} would guard against the empty case.

The sandbox job-template ConfigMap embedded jwtPublicKey into a JSON string literal as `"value": "{{ $as.jwtPublicKey }}"`. ES256 keys are normally multi-line PEM (BEGIN/END headers + newlines); a real newline inside a JSON string literal is invalid JSON, so the controller failed to read the job-template and could not spawn sandbox Jobs. (A compact JWK would break it too — embedded double-quotes.) Fix: `"value": {{ $as.jwtPublicKey | toJson }}` — toJson emits the quoted, fully-escaped JSON string (newlines -> \n, quotes -> \"). This also makes the JSON path consistent with the env-var paths, which already use `| quote`. Until now this only worked if the operator pre-flattened the key to a single `\n`-escaped line (the workaround the inline-secrets CI fixture relied on). Updated that fixture to a genuine multi-line PEM block scalar so it exercises the escaping, and corrected its comment. Verified: rendered the inline-secrets fixture and parsed the embedded job-template.json — VALID with the fix, JSONDecodeError without it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

ryanartecona · 2026-06-11T18:15:15Z

 description: A Helm chart for Kubernetes
 type: application
-version: 6.10.6
+version: 6.12.0


should be 6.11.0 no?

ryanartecona · 2026-06-11T18:16:13Z

  - name: retool-temporal-services-helm
    version: 1.1.5
-    condition: retool-temporal-services-helm.enabled,workflows.enabled
+    condition: retool-temporal-services-helm.enabled


this is a breaking change, which we were otherwise able to avoid. I think @lukefoster11 you may have introduced this IIRC? is this load bearing for r2 stuff or can we undo this part?

lukefoster11 and others added 30 commits May 8, 2026 10:51

support agent-sandbox service (#281)

dfba485

Made-with: Cursor

[INF-6675] add js executor (#275)

bdf1854

fix jsExecutor image lookup

44f3be5

fix gvisor seccomp errno ret

6efb7a3

revert accidental version bump (#286)

ab69c02

[INF-6865] increase js executor mem (#289)

519a1c3

* increase mem * update file

optional agentsandbox postgres secret

98ecae0

Revert "optional agentsandbox postgres secret"

d45fefd

This reverts commit 98ecae0.

new agent sandbox secret modularity (#290)

fcb40cb

disable (#292)

cde3897

make retool.fullname prefixed (#291)

f741066

Revert "make retool.fullname prefixed (#291)" (#293)

b73e070

This reverts commit f741066.

make retool.fullname prefixed (#294)

89c465e

separate deviceplugin use and deployment

67f8b7f

tune (#297)

0f9d24b

Add optional MCP git server URL (#299)

9d9b4a2

rr_agent_pubsub_backend (#300)

9118ff7

Rename sandbox env vars (#295)

6f9283c

* Rename sandbox env vars Also remove stale unused env vars & update job resource requests * fix sandbox job template commas --------- Co-authored-by: Ryan Artecona <ryanartecona@gmail.com>

add new env vars (#301)

d8cd136

JatinNanda and others added 4 commits June 11, 2026 09:37

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

JatinNanda marked this pull request as ready for review June 11, 2026 17:41

JatinNanda mentioned this pull request Jun 11, 2026

[agent-sandbox] fix jwtPublicKey breaking job-template JSON (use toJson) #325

Merged

JatinNanda and others added 2 commits June 11, 2026 13:58

Update charts/retool/values.yaml

f44c63e

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

ryanartecona reviewed Jun 11, 2026

View reviewed changes

JatinNanda mentioned this pull request Jun 11, 2026

Merge r2 → main: R² (React Retool) self-hosted support (chart 6.11.0) #326

Merged

JatinNanda closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge r2 → main: R² (React Retool) self-hosted support (chart 6.12.0)#324

Merge r2 → main: R² (React Retool) self-hosted support (chart 6.12.0)#324
JatinNanda wants to merge 36 commits into
mainfrom
jatin/r2-to-main

JatinNanda commented Jun 11, 2026

Uh oh!

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

ryanartecona Jun 11, 2026

Uh oh!

ryanartecona Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

		{{- if $as.jwtPublicKey }}
		,{"name": "AGENT_SANDBOX_JWT_PUBLIC_KEY", "value": "{{ $as.jwtPublicKey }}"}

Conversation

JatinNanda commented Jun 11, 2026

What

What's included

Conflict resolution

Prerequisites (done)

Validation (on the merged tree)

Not in this PR

Uh oh!

greptile-apps Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

ryanartecona Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

ryanartecona Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading