Skip to content

Merge r2 → main: R² (React Retool) self-hosted support (chart 6.11.0)#326

Merged
JatinNanda merged 37 commits into
mainfrom
r2
Jun 11, 2026
Merged

Merge r2 → main: R² (React Retool) self-hosted support (chart 6.11.0)#326
JatinNanda merged 37 commits into
mainfrom
r2

Conversation

@JatinNanda

Copy link
Copy Markdown
Contributor

Merges the full R² (React Retool) self-hosted feature set from the long-lived r2 branch into main. Opening r2 directly so all 33 commits are preserved in history.

What's included

Conflict

Prerequisites (done)

Consumer migrations to the rr.* layout landed first: retool-k8s #18177 (internal-onprem + admin + MSH) and #18178 (nuon).

Notes

  • Chart stays a minor release (6.12.0): additive for published consumers, all R² switches default off; the rename guard can't fire on the published line (those keys never existed there).
  • Repo is currently squash-only — enable merge commits before merging if history on main is to be preserved.
  • Supersedes Merge r2 → main: R² (React Retool) self-hosted support (chart 6.12.0) #324 (the jatin/r2-to-main intermediate-branch version).

🤖 Generated with Claude Code

@JatinNanda JatinNanda requested a review from ryanartecona June 11, 2026 18:27
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown

Greptile Summary

This PR merges the full R² (React Retool) self-hosted feature set from the long-lived r2 branch into main as chart 6.12.0, adding six new services (js-executor, rr-agent worker, agent-sandbox controller+proxy+Jobs, standalone git-server, MCP server) all gated behind rr.enabled: false / mcp.enabled: false defaults.

  • New deployments: JS executor, RR agent Temporal worker, agent-sandbox controller/proxy (with ephemeral Job-based sandboxes, RBAC, PDB, headless service), optional standalone git-server, and MCP server; each with seccomp DaemonSets, prepuller DaemonSet, and optional NetworkPolicy.
  • Values restructuring: Flat r2.* layout replaced by a rr.* hierarchy with a master switch, per-component enabled: null inheritance, and a fail-loud guard on any old keys at render time.
  • Routing additions: MCP OAuth metadata paths are prepended to both Ingress and HTTPRoute before the catch-all Retool route; an optional second backend-API service port is exposed for backendApi-targeted MCP paths.

Confidence Score: 4/5

Safe to merge for any deployment that leaves all R² switches at their default-off state; the one defect only surfaces when networkPolicy.enabled: true and an operator explicitly empties dnsSelector.

All new components default to disabled, so existing deployments are unaffected. The NetworkPolicy template contains a mismatch between its documented escape hatch and its implementation: setting dnsSelector: {} to allow DNS anywhere actually removes the DNS egress rule, which with policyTypes: Egress silently blocks all DNS for sandbox pods. An operator enabling the NetworkPolicy and following the documented guidance would find sandboxes unable to reach the proxy or any external host.

charts/retool/templates/agent_sandbox_networkpolicy.yaml — the dnsSelector conditional needs a complementary unrestricted-DNS egress rule for the empty case.

Important Files Changed

Filename Overview
charts/retool/templates/agent_sandbox_networkpolicy.yaml New NetworkPolicy for sandbox/controller/proxy pods; the dnsSelector empty case removes the DNS egress rule instead of adding an unrestricted one, blocking DNS contrary to documentation.
charts/retool/templates/deployment_agent_sandbox.yaml Large new file deploying controller/proxy Deployments, RBAC, ConfigMap job-template, headless Service, and PDBs for the agentSandbox feature; structure and secret handling look well-guarded.
charts/retool/templates/_helpers.tpl Large set of new helpers for RR components; validateSecrets, postgresUrlEnv, componentEnabled, and legacy-values guard all look correct; retool.env refactoring preserves existing behavior.
charts/retool/templates/deployment_mcp.yaml New MCP server deployment with OAuth introspection token validation; lacks externalSecrets/externalSecretsOperator envFrom blocks (already flagged in previous review thread).
charts/retool/templates/deployment_js_executor.yaml New JS executor deployment with seccomp init container; uses hardcoded busybox image (already noted in previous thread); probe settings inherited from top-level values.
charts/retool/templates/deployment_git_server.yaml New standalone git server deployment mirroring the backend pattern with blob-storage env vars; externalSecrets envFrom blocks are present and consistent with other deployments.
charts/retool/templates/_workers.tpl Adds the rrAgent worker descriptor with nested: rr support; cleanly extends existing worker loop without breaking existing agent/agentEval workers.
charts/retool/values.yaml Adds rr., mcp., and agentSandbox.* value blocks; all new switches default off; well-documented with examples and validation notes.
charts/retool/templates/service.yaml Conditionally exposes a second backend-API port when MCP backend-metadata routing is enabled; port-name collision guard prevents invalid service specs.
charts/retool/templates/ingress.yaml MCP paths are correctly prepended before the catch-all Retool route when both mcp.enabled and mcpIngress.enabled are true.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    FE["Frontend / Ingress"] --> BE["Backend (main)"]
    FE --> MCP["MCP Server\n(mcp.enabled)"]
    FE --> AGP["Agent-Sandbox Proxy\n(rr.agentSandbox)"]

    BE -- "RR_GIT_SERVER\n(in-process or separate)" --> GIT["Git Server\n(rr.gitServer.separate)"]
    BE -- "JS_EXECUTOR_INGRESS_DOMAIN" --> JSE["JS Executor\n(rr.jsExecutor)"]
    BE -- "AGENT_SANDBOX_CONTROLLER_INGRESS_DOMAIN" --> AGC["Agent-Sandbox Controller"]
    BE -- "AGENT_SANDBOX_PROXY_INGRESS_DOMAIN" --> AGP

    MCP -- "RETOOL_BACKEND_URL" --> BE
    MCP -- "RETOOL_GIT_SERVER_URL" --> GIT

    AGC -- "creates/deletes" --> JOBS["Sandbox Jobs\n(ephemeral pods)"]
    AGP -- "routes to" --> JOBS

    WKRR["RR Agent Worker\n(rr.agent)"] -- "Temporal" --> BE

    JOBS -- "blob storage" --> BLOB[(S3/GCS/Azure)]
    GIT -- "blob storage" --> BLOB

    subgraph RR ["rr.* stack (rr.enabled master switch)"]
        JSE
        WKRR
        AGC
        AGP
        JOBS
        GIT
    end
Loading

Reviews (2): Last reviewed commit: "reduce JSE default CPU request/limit fro..." | Re-trigger Greptile

{{- end }}
spec:
automountServiceAccountToken: false
priorityClassName: {{ $as.devicePlugin.priorityClassName }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unconditional priorityClassName render will produce an invalid manifest when a user follows the documented guidance and sets priorityClassName: null to opt out of the priority class (e.g., GKE environments that don't support system-node-critical in user namespaces). In Helm, rendering a nil value via bare {{ ... }} outputs the literal string <no value>, so the DaemonSet is submitted with priorityClassName: <no value> — a class that does not exist — and the kubelet rejects it.

Suggested change
priorityClassName: {{ $as.devicePlugin.priorityClassName }}
{{- if $as.devicePlugin.priorityClassName }}
priorityClassName: {{ $as.devicePlugin.priorityClassName }}
{{- end }}

Comment on lines +1 to +12
{{- if .Values.mcp.enabled }}
{{- $mcpConfig := .Values.mcp.config | default dict }}
{{- $hasOAuthIntrospectionAuthTokenEnv := false }}
{{- range .Values.mcp.environmentVariables }}
{{- if eq .name "OAUTH_INTROSPECTION_AUTH_TOKEN" }}
{{- $hasOAuthIntrospectionAuthTokenEnv = true }}
{{- end }}
{{- end }}
{{- if not (or $mcpConfig.oauthIntrospectionAuthTokenSecretName $mcpConfig.oauthIntrospectionAuthToken $hasOAuthIntrospectionAuthTokenEnv) }}
{{- fail "Please set .Values.mcp.config.oauthIntrospectionAuthTokenSecretName, .Values.mcp.config.oauthIntrospectionAuthToken, or an OAUTH_INTROSPECTION_AUTH_TOKEN entry in .Values.mcp.environmentVariables when the MCP server is enabled (.Values.mcp.enabled)" }}
{{- end }}
{{- $mcpInternalPort := .Values.mcp.service.internalPort | default 4010 }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 MCP deployment missing externalSecrets / externalSecretsOperator envFrom blocks

Every other service deployment in this chart (backend, git-server, jobs, workers) includes an envFrom block that splats externally-managed secrets into the pod when externalSecrets.enabled or externalSecrets.externalSecretsOperator.enabled is set. The MCP deployment does not. Users who configure a COOKIE_INSECURE, LICENSE_KEY, or other global Retool env var via their external secret store will find those variables absent from MCP pods — potentially causing silent auth failures if MCP relies on any of them.

Comment on lines +54 to +55
readOnlyRootFilesystem: true
capabilities:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded busybox image inconsistent with other seccomp DaemonSets

The install-seccomp init container here and in deployment_code_executor.yaml pin busybox:1.37.0@sha256:... directly in the template. The agent-sandbox seccomp DaemonSet (agent_sandbox_seccomp.yaml) and prepuller use the configurable rr.agentSandbox.initImage block. Air-gapped deployments can override the agent-sandbox busybox but cannot override the js-executor or code-executor one without patching the chart. Consider factoring out a rr.jsExecutor.initImage for consistency.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

lukefoster11 and others added 26 commits June 11, 2026 14:43
Adds optional MCP server support to the Retool Helm chart, disabled by default.

Main changes:

- Adds a new mcp values block in charts/retool/values.yaml and root values.yaml.
- Adds a standalone MCP Service, Deployment, and optional PodDisruptionBudget.
- Runs MCP using the backend image with SERVICE_TYPE=MCP_SERVER.
- Supports MCP configuration for replicas, resources, env vars, toolsets, transport/session limits, service ports, affinity, node selectors, and tolerations.
- Routes /mcp and /.well-known/oauth-protected-resource to the MCP service through both Ingress and HTTPRoute.
- Adds MCP helper labels/naming in _helpers.tpl.
- Adds CI render coverage via test-mcp-enabled-option.yaml.

Validation performed:

- Helm template render with MCP disabled
- Helm template render with MCP enabled
- Helm lint with MCP enabled
- kubeconform validation during earlier verification
* increase mem

* update file
* make agentSandbox.image.tag non-required

* Make agentSandbox.devicePlugin.priorityClassName configurable for GKE support

* try adding ingress support for agentsandbox proxy url

* disable apparmor in sandbox jobs for gke/aks support

* try adding httproute support for r2 agent-proxy

* trim whitespace
…obStorage config (#296)

* [chore][r2] add RR_GIT_SERVER to main backend's default SERVICE_TYPE

Pairs with retool_development's RR_GIT_SERVER scaffold (commit
68162710ee0 on jatin/git-server-scaffold). The git-server runs
in-process alongside MAIN_BACKEND rather than as a split-out
deployment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [feat][r2] gate RR_GIT_SERVER on rrGitServer.enabled and add blobStorage config

git_server needs an object store for repo blobs/packs (and snapshots use
the same backend abstraction). The earlier commit unconditionally
appended RR_GIT_SERVER to SERVICE_TYPE, which would have main backend
crash at runtime on the first git op when blob storage isn't configured.

Adds:
- rrGitServer.enabled (default false) — gates the SERVICE_TYPE append
- blobStorage block with s3 / gcs / azure sub-blocks (set exactly one)
- {{ fail }} guard requiring exactly one provider when rrGitServer.enabled
- Renders RR_BLOB_STORAGE_PROVIDER + RR_DEFAULT_<PROVIDER>_* env vars on
  the main backend deployment, with secretKeyRef support for the secret
  (S3 secret access key, Azure connection string, GCS credentials)
- Optional rrGitServer.repackThreshold -> RR_GIT_REPACK_THRESHOLD

blobStorage is a top-level block (not nested under rrGitServer) because
the backend's RR_DEFAULT_* vars are shared with snapshots; this same
config will feed them once they get wired up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [refactor][r2] extract rrGitServer blob storage provider check to a helper

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [chore][r2] allow blobStorage opt-out via direct env vars

The rrGitServer.enabled fail-fast was blocking customers who'd rather
plumb RR_BLOB_STORAGE_PROVIDER / RR_DEFAULT_*_* in directly via
environmentVariables / environmentSecrets. Mirror the mcp pattern of
detecting the env var and skipping the guard when present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [chore] sync top-level values.yaml with charts/retool/values.yaml

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…CP metadata (#298)

Adds `mcp.config.oauthMainDomain`, which renders `OAUTH_MAIN_DOMAIN` into the MCP deployment for OAuth metadata base URL configuration. Documents the new MCP OAuth domain configuration in both chart values files. Updates the MCP render fixture so Helm rendering exercises the new environment variable. Validated with Helm rendering and linting.
* Rename sandbox env vars

Also remove stale unused env vars & update job resource requests

* fix sandbox job template commas

---------

Co-authored-by: Ryan Artecona <ryanartecona@gmail.com>
…l-k8s (#310)

* [fix[R2] Increase the AE proxy timeout to be inline with fix in retool-k8s

* Update charts/retool/values.yaml

Co-authored-by: Ryan Artecona <ryanartecona@gmail.com>

* Update values.yaml

Co-authored-by: Ryan Artecona <ryanartecona@gmail.com>

* lint fix

---------

Co-authored-by: Ryan Artecona <ryanartecona@gmail.com>
…xy, secrets, git-server split) (#315)

* js-executor: drop backend-shared env inheritance + resize resources (#304)

* js-executor: stop inheriting backend-shared env

The js-executor deployment looped over the backend-shared .Values.env and
.Values.environmentSecrets (and .Values.environmentVariables) unfiltered,
injecting db creds, auth/encryption secrets, license key, and other backend
config into a pod that needs none of it. This pollutes the workload and
widens the blast radius of any change to shared env.

js-executor is a standalone nsjail JS sandbox that reads none of the
backend-shared env vars. Replace the inheritance with per-workload overrides:
jsExecutor.env / jsExecutor.environmentSecrets / jsExecutor.environmentVariables
(all default empty), matching the self-contained pattern already used by the
mcp and agent_sandbox workloads.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* js-executor: bump CPU to 6000m, set memory 6Gi

Bump js-executor CPU rather than shrinking memory. Set requests == limits at
cpu: 6000m / memory: 6Gi (Guaranteed QoS). The memory request is kept equal
to the limit because JSE reads its memory limit and rejects requests at 80%
of it, so the request must reserve the full amount.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* rrGitServer: accept blob-storage env vars from .Values.env (#307)

* rrGitServer: also accept blob-storage env vars from .Values.env

validateBlobStorage only scanned environmentVariables and environmentSecrets
for RR_BLOB_STORAGE_PROVIDER / RR_DEFAULT_*, so deployments that configure
those via the .Values.env map had to duplicate them into environmentVariables
to satisfy the check. Range over .Values.env (keyed by var name) as well, and
mention env in the doc comment and failure message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* rrGitServer: add skipBlobStorageValidation escape hatch

The blob-storage guard can only inspect blobStorage / env /
environmentVariables / environmentSecrets at template time. Env vars injected
via envFrom (a Secret/ConfigMap splat) are invisible to it, so a valid
configuration that supplies RR_BLOB_STORAGE_PROVIDER / RR_DEFAULT_* that way
would fail the check with no way out.

Add rrGitServer.skipBlobStorageValidation (default false) to bypass the check
entirely, and point at it from the failure message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Document self-hosted same-origin agent-sandbox proxy (no extra ingress) (#302)

Clarify that leaving agentSandbox.frontendWsProxyDomain empty makes the
backend serve the sandbox proxy same-origin via the main ingress, so no
dedicated proxy domain or ingress object is required for self-hosted.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* re-sync values.yaml with chart copy after #302

PR #302 updated the agentSandbox.frontendWsProxyDomain comment in
charts/retool/values.yaml but not the mirrored root values.yaml, leaving the
two out of sync (and failing the values-yaml-synced check on PRs targeting
this branch). Copy the richer comment into the root values.yaml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [feat][r2] optionally split rrGitServer into its own deployment (#309)

Adds rrGitServer.separate.enabled to run the git server as a dedicated
deployment + service instead of in-process on the main backend, mirroring
how the workload is split in Retool Cloud (reached via normal k8s service
discovery).

When enabled:
- a dedicated <release>-git-server Deployment runs SERVICE_TYPE=RR_GIT_SERVER
  on RR_GIT_SERVER_PORT, with the Postgres connection, bootstrap secrets,
  blob-storage env, and telemetry
- the main backend drops RR_GIT_SERVER from its SERVICE_TYPE and proxies git
  traffic to the service via RR_GIT_SERVER_HOST / RR_GIT_SERVER_PORT
- the MCP server (if enabled) is auto-pointed at the service unless
  mcp.config.retoolGitServerUrl is set explicitly

The blob-storage env block is extracted into a shared helper
(retool.rrGitServer.commonEnv) so the in-process backend and the standalone
deployment stay in sync. In-process mode (rrGitServer.enabled without
separate) is unchanged.

Adds ci/test-rr-git-server-separate-option.yaml exercising the split + S3
blob storage + MCP auto-wiring.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* agent-sandbox: validate required secrets + existing-secret DSN ref (#308)

* agent-sandbox: validate required secrets, flexible Postgres DSN sourcing

The agent-sandbox secret story was under-validated and rigid:

- An empty postgres.url silently base64-encoded to nothing
  ({{ $as.postgres.url | default "" | b64enc }}), so a misconfigured deploy
  installed cleanly and the controller/proxy crash-looped at runtime.
- jwtPublicKey / jwtPrivateKey (required for the controller/proxy to boot and
  for the backend to sign sandbox tokens) had no guard when absent.
- Postgres could only be supplied as a plaintext DSN; operators could not reuse
  an existing password-only secret (e.g. the backend's Postgres password).

The agent-sandbox app consumes a single connection string (no split-field code
path), so the chart now offers four ways to supply it, validated at install:

  1. postgres.url            -- plaintext DSN.
  2. postgres.host (+ user + database) -- the chart assembles
     postgres://user@host:port/database and supplies the password out-of-band
     via the PGPASSWORD env var, from postgres.password or
     postgres.passwordSecretName. node-postgres reads PGPASSWORD when the DSN
     omits the password, so the password needs no URL escaping -- any
     characters are safe. This is what lets a password-only secret be reused.
  3. postgres.urlSecretName  -- existing secret holding the full DSN.
  4. externalSecret.name     -- catch-all secret, postgres-url key.

user/database are embedded in the assembled DSN verbatim. Percent-encoding does
not round-trip here (pg-connection-string decodes userinfo before splitting on
':' and runs the path through decodeURI), so validateSecrets instead rejects the
characters that would break parsing -- ':' '/' '?' '#' / whitespace in user and
'?' '#' / whitespace in database. '@' is allowed (Azure-style user@servername
parses correctly, splitting on the last '@'); for other characters use options
1 or 3.

Other changes:
- Add retool.agentSandbox.validateSecrets: fail at install time when an enabled
  workload is missing a Postgres source, user/database for the assemble path, a
  JWT public key, or a JWT private key, or has unsafe characters in user/database.
- Promote the controller/proxy URL block to retool.agentSandbox.postgresUrlEnv.
- Only write postgres-url into the chart-managed secret when a plaintext url is
  set, so empty keys are never emitted.
- Document the canonical shapes and the password-secret reuse path.

Audit: mcp already fails on its missing required secret; js_executor has no
secrets, so neither needs changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* agent-sandbox: inherit backend Postgres connection by default

Enabling the agent sandbox on an existing deployment previously meant
re-entering the Postgres host/database/user (and pointing at the password)
under agentSandbox.postgres, even though the sandbox lives in the same database
as the backend, just under a separate schema.

Add inheritance as the default: when none of agentSandbox.postgres.url /
.host / .urlSecretName / agentSandbox.externalSecret.name is set, the chart
assembles the DSN from the backend's connection (config.postgresql or the
postgresql subchart, via the retool.postgresql.* helpers) and sources PGPASSWORD
from the same secret the backend uses (mirrors POSTGRES_PASSWORD in
deployment_backend.yaml). So enabling r2 against the existing database needs no
new Postgres values; the schema stays separate (postgres.schema, default
agent_executor). Any explicit option still overrides.

validateSecrets gates the one combination inheritance can't reach: when the
backend password is supplied via external secrets (envFrom) with no discrete
key, it fails with guidance to set an explicit option. The assembled URL
defaults the port to 5432 when config.postgresql.port is unset.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* agent-sandbox: fix stale Option 4 postgres comment

After adding default inheritance, "leave options 1-3 blank" no longer selects
Option 4 -- it selects the default (inherit config.postgresql). Clarify that
Option 4 is chosen by setting externalSecret.name (in the Secrets section), and
that leaving options 1-4 all unset falls through to inheritance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* agent-sandbox: guard host-assembly path with no password source

When postgres.host was set without postgres.password or
postgres.passwordSecretName, postgresUrlEnv emitted a DSN with no password and
no PGPASSWORD, so the misconfiguration only surfaced at runtime. validateSecrets
now fails at install in that case, pointing to postgres.url / urlSecretName for
intentionally passwordless setups (IAM/trust auth).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: test coverage for r2 workloads (js-executor, agent-sandbox, r2Agent) (#312)

* ci: add test values for agent-sandbox and js-executor workloads

The R2 js-executor and agent-sandbox workloads had no CI test values, so a
values change could break their templates silently. Only agents and mcp were
covered under charts/retool/ci/.

Add test-js-executor-enabled-option.yaml and test-agent-sandbox-enabled-option.yaml
enabling each workload with realistic config. These are auto-discovered by
.github/kubeconform.sh (find -name '*option.yaml') and overlaid on every base
values file across the kubeconform matrix — no workflow change needed.

Both pass helm template + kubeconform against all base values files on k8s
1.27.16 through 1.31.6.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: expand r2 workload coverage (secret/postgres matrix, ingress modes, r2Agent)

Rebased onto the latest r2-cleanup, which merged #308 (agent-sandbox
validateSecrets + flexible Postgres sourcing) and #309 (split rrGitServer).
Adds test values exercising the full new surface:

agent-sandbox — one option file per secret/Postgres precedence path so every
branch of postgresUrlEnv/validateSecrets is templated:
  - existing externalSecret.name file → Postgres option 4 + dedicated proxy
    domain WITH ingress + TLS + networkPolicy + device plugin + both PDBs
  - inline secrets (chart-rendered Secret) + plaintext DSN (option 1) +
    same-origin proxy / NO ingress + hostPath /dev/net/tun (devicePlugin off)
  - assemble DSN from fields + PGPASSWORD secretKeyRef (option 2), Azure-style
    user@server username, external device-manager (deployDaemonSet off)
  - full DSN from an existing Secret via urlSecretName (option 3)
  - zero-config inherit of the backend Postgres connection (option 5)

r2Agent — new worker (R2_AGENT_TEMPORAL_WORKER, port 3016) Deployment/Service/PDB.

js-executor — add environmentSecrets to cover the per-workload secretKeyRef branch.

All ci/*option.yaml validate via helm template + kubeconform against all three
base values files on k8s 1.27.16 and 1.31.6 (108 combinations).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: honor jsExecutor.image.pullPolicy in js-executor deployment

deployment_js_executor.yaml read the global .Values.image.pullPolicy, so the
per-workload jsExecutor.image.pullPolicy knob (present in values.yaml) was dead.
This was inconsistent with the js-executor image *tag* (per-workload via the
retool.jsExecutor.image.tag helper) and with agent-sandbox (reads
$as.image.pullPolicy). Read jsExecutor.image.pullPolicy with a fallback to the
global value.

The js-executor CI test now sets pullPolicy: Always (differs from the global
IfNotPresent) so a regression back to the global value is caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [feat][r2] add single r2.enabled master switch for R2 components (#313)

* [feat][r2] add single r2.enabled master switch for R2 components

Turning on the R2 stack previously meant flipping four independent flags
(r2Agent, jsExecutor, agentSandbox, mcp). Add a top-level `r2.enabled`
master switch that toggles all four collectively, with room for shared R2
config later.

Semantics: inherit + override. Each component's `enabled` default changes
from false to null; when null it inherits `r2.enabled`, and an explicit
true/false on the component overrides the master for that component only.
Backward compatible: existing configs that set the per-component flags
explicitly behave identically.

Add generic helper `retool.r2.componentEnabled`; `retool.r2Agent.enabled`
delegates to it. Every read of these flags is routed through the helper --
not just the deployment guards but the cross-component env wiring in
backend/workflows/jobs/_workers and the agentSandbox validate/backendEnv/
httproute helpers -- so an inherited (null) flag still drives JS_EXECUTOR
and AGENT_SANDBOX env injection instead of reading as false.

Add ci/test-r2-enabled-option.yaml covering the master-switch inherit path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [r2] update MCP oauth-token fail message for inherited enablement

The error still said "when .Values.mcp.enabled is true", which misleads
operators who enable MCP via the new master switch (r2.enabled: true) and
leave mcp.enabled null. Reword to cover both the explicit flag and inheritance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* agent-sandbox: reject ':' and '/' in postgres.database DSN assembly

The host-fields DSN path assembles postgres://user@host:port/database via
printf, and validateSecrets guards the embedded user/database against
characters that break URL parsing. The user check rejected [\s:/?#] but the
database check only rejected [\s?#], so a database name containing '/' (e.g.
'my/db') silently produced postgres://user@host:5432/my/db -- which pg URL
parsers read as database 'my' with a trailing path, connecting to the wrong
database. Align the database check with the user check ([\s:/?#]); affected
names must instead supply a full DSN via postgres.url / postgres.urlSecretName.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mcp requires an OAuth introspection token to template (oauthIntrospectionAuthToken
/ secret / env), unlike the other R2 components. Having mcp inherit the
r2.enabled master switch meant `r2.enabled: true` hard-failed out of the box
("Please set ...oauthIntrospectionAuthToken... when the MCP server is enabled")
unless the user also configured mcp — defeating the one-line enable.

Make mcp independent: mcp.enabled defaults to false and is read directly
(deployment_mcp.yaml gates on .Values.mcp.enabled), so the master switch governs
only r2Agent/jsExecutor/agentSandbox. mcp stays opt-in via mcp.enabled: true.
Update the componentEnabled doc, the OAuth fail message, and the
test-r2-enabled-option fixture (mcp must no longer render from r2.enabled alone).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
westrik and others added 9 commits June 11, 2026 14:43
…ifetimeMs (#317)

Wire up controller.scaling.perUserSandboxLimit config option (default 5) and sandbox.sandboxGlobalLifetimeMs (default 2.5 hrs).

Remove environment variables that are no longer used: SLOTS_PER_POD, EXECUTOR_{MIN,MAX}_REPLICAS, SCALE_{UP,DOWN}_THRESHOLD, SCALE_DOWN_GRACE_PERIOD_MS.
retool-k8s (helm/retool-workflow-jail/files/nsjail-seccomp.json) is the
source of truth for the nsjail seccomp profile. The public chart copy had
drifted in its `socket` syscall family rules; this re-syncs it verbatim so
the public jsExecutor/codeExecutor sandbox matches what we run internally.

Co-authored-by: Cursor <cursoragent@cursor.com>
* Set appArmorProfile Unconfined for js-executor

nsjail (used by js-executor to sandbox user code) remounts the rootfs and
sets up its mount namespace at startup. On nodes where the container runtime
attaches an AppArmor profile to non-privileged containers — e.g. GKE
Container-Optimized OS, where containerd applies cri-containerd.apparmor.d
with `deny mount` — that mount is rejected with EPERM and the sandbox fails
to launch. EKS (Amazon Linux 2023) uses SELinux and attaches no AppArmor
profile, so this never surfaced there.

Run js-executor with appArmorProfile Unconfined so nsjail can set up its
sandbox, mirroring the existing agent-sandbox container. The Localhost
seccomp profile continues to provide syscall-level isolation.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Remove explanatory comment from js-executor appArmorProfile

Co-authored-by: Cursor <cursoragent@cursor.com>

* Use AppArmor annotation instead of securityContext field for js-executor

The appArmorProfile securityContext field only exists in the Kubernetes API
from v1.30+, so strict kubeconform validation against v1.27-v1.29 rejected it
with "additionalProperties 'appArmorProfile' not allowed". Switch to the
container.apparmor.security.beta.kubernetes.io/<container> pod annotation,
which is honored across all supported Kubernetes versions and is not subject
to schema validation.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…deExecutor.useSeccompProfile) (#311)

* Run code-executor unprivileged with seccomp on k8s >= 1.33

On Kubernetes 1.33+ (where the ProcMountType and UserNamespacesSupport
feature gates are on by default), the code-executor now runs unprivileged
using a localhost seccomp profile, NET_ADMIN, an unmasked /proc, and user
namespaces - mirroring how the JS executor sandboxes itself. The nsjail
seccomp profile is installed onto the node by an install-seccomp init
container. On older clusters it falls back to the existing privileged mode,
so the chart still installs without requiring 1.33+.

Setting codeExecutor.securityContext explicitly continues to override this
behavior for either mode.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Keep root values.yaml in sync with charts/retool/values.yaml

Co-authored-by: Cursor <cursoragent@cursor.com>

* Drop codeExecutor securityContext comments

Co-authored-by: Cursor <cursoragent@cursor.com>

* Document why code-executor uses seccomp on k8s 1.33+

Co-authored-by: Cursor <cursoragent@cursor.com>

* Note 1.33+ upgrade for fine-grained privileges

Co-authored-by: Cursor <cursoragent@cursor.com>

* Gate code-executor seccomp behind codeExecutor.useSeccompProfile flag

Replace the automatic k8s >= 1.33 version detection with an explicit
opt-in flag (codeExecutor.useSeccompProfile, default false). The chart
defaults to the existing privileged mode and only renders the
unprivileged seccomp path (seccomp profile + NET_ADMIN + unmasked /proc +
user namespaces + install-seccomp init container) when the operator sets
the flag. An explicitly pinned codeExecutor.securityContext still wins.

Enabling the flag requires Kubernetes 1.33+ (ProcMountType and
UserNamespacesSupport feature gates); this is now the operator's
responsibility rather than auto-detected.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Consolidate seccomp docs into values.yaml comment

Move the detailed rationale for codeExecutor.useSeccompProfile into the
values.yaml comment (operator-facing) and reduce the template comment to a
short pointer explaining the $useSecComp local.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Set AppArmor unconfined for code-executor seccomp path

When codeExecutor.useSeccompProfile drops the privileged securityContext,
the container is run under the container runtime's default AppArmor profile
on AppArmor-enabled nodes (e.g. GKE Container-Optimized OS, where containerd
attaches cri-containerd.apparmor.d with `deny mount`). nsjail remounts the
rootfs and /proc to build its sandbox, so that profile rejects the mounts
with EPERM and code-executor crash-loops. Privileged containers were unaffected
because AppArmor confinement is not applied to them.

Add the container.apparmor.security.beta.kubernetes.io/code-executor:
unconfined pod annotation, gated to the same $useSecComp path as the seccomp
profile, hostUsers and procMount changes. The annotation is honored across
all supported Kubernetes versions (unlike the appArmorProfile field, which is
v1.30+) and is not subject to strict schema validation. The Localhost seccomp
profile continues to provide syscall isolation.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Reword useSeccompProfile gate comment to drop operator phrasing

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Consolidate the RR (formerly "r2") stack into a single top-level `rr:` block
whose `rr.enabled` is the master switch, with every component RR needs nested
directly under it:

  rr:
    enabled: false
    jsExecutor: {...}      # inherits rr.enabled
    agent: {...}           # RR server-side agent worker — inherits rr.enabled
    agentSandbox: {...}    # inherits rr.enabled
    gitServer: {...}       # required for rr
    blobStorage: {...}     # required for rr

The vocabulary is renamed r2 -> rr to match the RR_ env vars, and the nested
keys carry no redundant prefix (the `rr:` namespace scopes them) — the full path
composes to the env var (rr.gitServer -> RR_GIT_SERVER, rr.blobStorage ->
RR_BLOB_STORAGE), and rr.gitServer matches the rendered `-git-server` resource
name. mcp and the separate AI-`agents` feature stay top-level (mcp is
intentionally independent of the master switch).

Helpers are retool.rr.* (componentEnabled, validateLegacyValues) and
retool.gitServer.* / retool.agent.enabled.

Intentionally NOT renamed, so this is a no-op for running pods (no resource
recreation / no backend contract break):
  - SERVICE_TYPE=R2_AGENT_TEMPORAL_WORKER, temporal taskqueue r2-agent, and the
    r2-agent-worker resource + telemetry name.
  - the agent's internal worker identity: worker `type: rrAgent` and the
    retool.rrAgentWorker.* helpers, kept distinct from the AI-`agents` worker's
    retool.agentWorker.* to avoid a collision. Only the user-facing value key
    (rr.agent) and its enable helper are de-prefixed.
  - the unrelated "Cloudflare R2" mention in the blob-storage example.

Robustness:
  - retool.rr.componentEnabled is kind-aware: an absent/null component block is
    disabled (no config to render); a map uses its `enabled` (inheriting the
    master switch when unset); a non-mapping value (e.g. a bare bool) fails
    loudly with guidance. Fixes the nil-dereference on an explicitly-nulled
    component and avoids relocating the crash into the deployment templates.
  - retool.rr.validateLegacyValues catches BOTH old top-level keys (the `r2:`
    master switch and the un-nested components) AND old leaf names left under
    the new `rr:` block (rr.r2Agent/rrAgent/rrGitServer/rrBlobStorage), mapping
    each to its new path. helm template/upgrade fails loudly rather than
    silently disabling RR.
  - the nested worker's values owner is resolved from a declarative `nested: rr`
    field on the worker descriptor instead of a hardcoded parent-name match.

Verified: rendered manifests are byte-identical to the original r2 branch across
all six scenarios (only the random postgres-password differs); helm lint clean;
all 10 RR CI overlays render; both values.yaml copies kept byte-identical.
Renamed test overlays test-r2-*-option.yaml -> test-rr-*-option.yaml.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_MS (#322)

The agent-executor sandbox connect timeout (config.ts readyTimeoutMs) is now
env-configurable via SANDBOX_READY_TIMEOUT_MS, but the chart never set it, so
the job-template fell back to the image default (20s). Interactive sandbox boot
(gVisor + bundle load) can exceed that, surfacing "did not connect within
20000ms". Add an agentSandbox.sandbox.sandboxReadyTimeoutMs knob (default 20000,
matching the code default) and emit SANDBOX_READY_TIMEOUT_MS in the job-template
env next to SANDBOX_IDLE_TIMEOUT_MS / SANDBOX_GLOBAL_LIFETIME_MS, so operators
can raise it (e.g. 45000) without manual job-template patching.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pre-rename (`r2.*` / top-level component) values guard added in #321
already fails loud, but the message buried the call to action. Lead with
"ACTION REQUIRED: update your Helm values file", state the deploy is
blocked, and give an explicit "edit your values file and rename these
keys" instruction before the key-move list.

Message-only change inside the existing fail string: no values.yaml or
CI changes. Verified the guard still fires (now with the clearer text)
on a legacy key and stays silent on a clean rr.* render.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on) (#325)

The sandbox job-template ConfigMap embedded jwtPublicKey into a JSON
string literal as `"value": "{{ $as.jwtPublicKey }}"`. ES256 keys are
normally multi-line PEM (BEGIN/END headers + newlines); a real newline
inside a JSON string literal is invalid JSON, so the controller failed
to read the job-template and could not spawn sandbox Jobs. (A compact
JWK would break it too — embedded double-quotes.)

Fix: `"value": {{ $as.jwtPublicKey | toJson }}` — toJson emits the
quoted, fully-escaped JSON string (newlines -> \n, quotes -> \"). This
also makes the JSON path consistent with the env-var paths, which
already use `| quote`.

Until now this only worked if the operator pre-flattened the key to a
single `\n`-escaped line (the workaround the inline-secrets CI fixture
relied on). Updated that fixture to a genuine multi-line PEM block
scalar so it exercises the escaping, and corrected its comment.

Verified: rendered the inline-secrets fixture and parsed the embedded
job-template.json — VALID with the fix, JSONDecodeError without it.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ut) (#327)

agent_sandbox_device_plugin.yaml rendered `priorityClassName:
{{ $as.devicePlugin.priorityClassName }}` unconditionally. The
documented GKE opt-out sets `rr.agentSandbox.devicePlugin.priorityClassName:
null` (GKE rejects system-node-critical in user namespaces). A bare
{{ ... }} on a nil value emits the literal string `<no value>`, so the
DaemonSet was submitted with `priorityClassName: <no value>` — a
nonexistent class the kubelet rejects, which blocks the whole agent
sandbox from scheduling.

Wrap it in `{{- if $as.devicePlugin.priorityClassName }}` so the field
is omitted when null, matching how every other workload guards
.Values.priorityClassName.

Adds ci/test-agent-sandbox-deviceplugin-no-priorityclass-option.yaml —
device-plugin DaemonSet with priorityClassName: null. kubeconform
rejects `<no value>`, so this guards the regression.

Verified: null -> field omitted (no <no value>); default ->
priorityClassName: system-node-critical still renders; lint clean.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JatinNanda and others added 2 commits June 11, 2026 15:01
One minor bump after the latest public release on charts.retool.com
(6.10.5 stable; 6.11.0-rc1 pre-release). The R² feature set ships as
6.11.0, graduating the existing 6.11.0-rc1 to final. (The 6.12.0 carried
on the branch was an internal number that was never published.)

Minor release: additive for existing consumers, all R² switches default off.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JatinNanda JatinNanda marked this pull request as ready for review June 11, 2026 19:25

@ryanartecona ryanartecona left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀 🚀 🚀

@JatinNanda JatinNanda merged commit 9da5c4e into main Jun 11, 2026
13 checks passed
@JatinNanda JatinNanda changed the title Merge r2 → main: R² (React Retool) self-hosted support (chart 6.12.0) Merge r2 → main: R² (React Retool) self-hosted support (chart 6.11.0) Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants