Skip to content

fix(onboard): propagate rotated messaging credentials to sandbox L7 proxy#1967

Open
ericksoa wants to merge 8 commits intomainfrom
fix/credential-rotation-propagation
Open

fix(onboard): propagate rotated messaging credentials to sandbox L7 proxy#1967
ericksoa wants to merge 8 commits intomainfrom
fix/credential-rotation-propagation

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Apr 16, 2026

Summary

  • Fixes credential rotation bug where re-running nemoclaw onboard --non-interactive with a new messaging token (e.g. TELEGRAM_BOT_TOKEN) reports success but the sandbox continues using the old credential
  • Stores SHA-256 hashes of messaging credentials in the sandbox registry at creation time; on reuse, compares current tokens against stored hashes to detect rotation
  • When rotation is detected, automatically rebuilds the sandbox (backup → delete → create with new providers → restore), preserving workspace state

Closes #1903

Changes

File Change
src/lib/registry.ts Add providerCredentialHashes field to SandboxEntry
src/lib/onboard.ts Add hashCredential(), detectMessagingCredentialRotation(), modify reuse path to detect rotation and trigger rebuild with backup/restore, store hashes at registration
test/credential-rotation.test.ts 11 unit tests for hash and rotation detection logic
test/e2e/test-token-rotation.sh E2E test: onboard with token A → rotate to token B (verify rebuild + workspace preservation) → re-onboard with same token B (verify reuse)

Design Notes

  • Legacy sandboxes (created before this fix) have no stored hashes → detectMessagingCredentialRotation returns changed: false (conservative, no false rebuilds)
  • Workspace preservation uses existing sandbox-state.ts backup/restore machinery (same as sandboxRebuild)

Test plan

  • npx vitest run test/credential-rotation.test.ts — 11/11 pass
  • npx vitest run test/onboard.test.ts — 117/117 existing tests pass (no regressions)
  • Full test suite — 1583/1583 tests pass
  • tsc -p tsconfig.cli.json --noEmit — clean
  • All pre-commit and pre-push hooks pass
  • E2E: test/e2e/test-token-rotation.sh with real Telegram tokens

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Detects messaging credential rotation (e.g., Telegram tokens) and triggers sandbox rebuilds when needed
    • Attempts workspace backup before rebuild and restore after to preserve state
    • Persists per-credential hashes in the local sandbox registry for rotation tracking
  • Tests

    • Unit tests for credential-rotation detection and hashing
    • End-to-end test validating token rotation, rebuild behavior, and workspace preservation
  • Chores

    • Nightly E2E workflow updated to run the token-rotation job

…roxy

When a user re-runs `nemoclaw onboard` with a rotated Telegram bot token,
the provider update succeeds at the gateway level but the running sandbox's
L7 proxy continues using the old credential. OpenShell v0.0.26 only attaches
providers at `sandbox create` time and has no hot-reload mechanism.

Fix: store SHA-256 hashes of messaging credentials in the sandbox registry
at creation time. On reuse, compare current tokens against stored hashes.
When rotation is detected, automatically rebuild the sandbox (backup →
delete → create with new providers → restore) to propagate the new
credential through the L7 proxy.

Closes #1903
Ref: nvbug 6083165

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds SHA‑256 credential hashing and rotation detection; integrates rotation checks into sandbox reuse/recreate flow; persists per‑credential hashes in the sandbox registry; performs workspace backup/restore around sandbox rebuilds when rotation is detected. Adds unit and E2E tests and a nightly CI job for token-rotation E2E.

Changes

Cohort / File(s) Summary
Core logic & exports
src/lib/onboard.ts
Added and exported hashCredential() and detectMessagingCredentialRotation(); integrated rotation detection into sandbox creation to block reuse when messaging credentials changed; added workspace backup/restore around sandbox rebuilds; store per-credential hashes on registration.
Registry type & persistence
src/lib/registry.ts
Extended exported SandboxEntry with optional providerCredentialHashes?: Record<string,string> and updated registerSandbox() to persist the field when non-empty.
Unit tests
test/credential-rotation.test.ts
New Vitest suite validating hashCredential() and detectMessagingCredentialRotation() across legacy, matching, differing, multi-provider, null-token, and missing-sandbox scenarios; uses dynamic require and mocks.
E2E test script
test/e2e/test-token-rotation.sh
New Bash E2E that runs three onboarding phases (initial token A, rotate to token B expecting rebuild + state restore, re-onboard expecting reuse); validates registry providerCredentialHashes, sandbox/provider presence, and workspace persistence.
CI workflow
.github/workflows/nightly-e2e.yaml
Added token-rotation-e2e job to nightly E2E, wired it into notify-on-failure.needs and updated failure condition; uploads install log on failure.

Sequence Diagram

sequenceDiagram
    participant User
    participant Onboard as Onboard Process
    participant Registry
    participant State as WorkspaceState
    participant Sandbox

    User->>Onboard: run onboard --non-interactive (with messaging tokens)
    Onboard->>Registry: get sandbox entry
    Registry-->>Onboard: sandbox entry (may include providerCredentialHashes)
    Onboard->>Onboard: hash current tokens, compare to stored hashes
    alt rotation detected
        Onboard->>State: backup workspace
        State-->>Onboard: backup artifact
        Onboard->>Sandbox: delete existing sandbox
        Onboard->>Sandbox: create new sandbox
        Onboard->>Registry: register sandbox + new providerCredentialHashes
        Registry-->>Onboard: confirm registration
        Onboard->>State: restore workspace into new sandbox
        State-->>Sandbox: workspace restored
    else no rotation
        Onboard->>Sandbox: reuse existing sandbox
    end
    Onboard-->>User: onboarding complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibble tokens, trim and hash,

when old ones change I make a splash,
I tuck your files into a chest,
rebuild the burrow, do my best,
then hop back in — your work's at rest.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing credential rotation propagation to the sandbox L7 proxy by detecting rotation and rebuilding the sandbox with new credentials.
Linked Issues check ✅ Passed The PR implements all core coding requirements from #1903: credential hash storage, rotation detection, automatic sandbox rebuild on rotation, and unit/E2E tests covering token rotation scenarios.
Out of Scope Changes check ✅ Passed All changes align with the linked issue: credential hashing and rotation detection in onboard.ts, registry field for storing hashes, unit tests for detection logic, E2E script for token rotation validation, and CI workflow updates.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/credential-rotation-propagation

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/registry.ts (1)

153-169: ⚠️ Potential issue | 🔴 Critical

Persist providerCredentialHashes when registering a sandbox.

createSandbox() now passes providerCredentialHashes, but this constructor drops the field. Every new sandbox is therefore recorded as a legacy entry, so detectMessagingCredentialRotation() never sees stored hashes and the rotation-rebuild path won't fire on the next onboard run.

Suggested fix
     data.sandboxes[entry.name] = {
       name: entry.name,
       createdAt: entry.createdAt || new Date().toISOString(),
       model: entry.model || null,
       nimContainer: entry.nimContainer || null,
       provider: entry.provider || null,
       gpuEnabled: entry.gpuEnabled || false,
       policies: entry.policies || [],
       policyTier: entry.policyTier || null,
       agent: entry.agent || null,
       dangerouslySkipPermissions:
         entry.dangerouslySkipPermissions === true ? true : undefined,
       agentVersion: entry.agentVersion || null,
+      providerCredentialHashes:
+        entry.providerCredentialHashes &&
+        Object.keys(entry.providerCredentialHashes).length > 0
+          ? { ...entry.providerCredentialHashes }
+          : undefined,
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/registry.ts` around lines 153 - 169, registerSandbox is not
persisting providerCredentialHashes from the incoming SandboxEntry so new
sandboxes are recorded without hashes and break
detectMessagingCredentialRotation; update the object assigned to
data.sandboxes[entry.name] inside registerSandbox to include
providerCredentialHashes: entry.providerCredentialHashes || null (or undefined
as appropriate) so the stored entry retains the hashes passed by createSandbox;
ensure the property name matches SandboxEntry and any downstream checks that
read providerCredentialHashes from load().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2699-2715: The code calls
sandboxState.backupSandboxState(sandboxName) but only treats thrown errors; if
backup.success === false the code should abort the automatic rebuild to avoid
destroying the workspace. After calling sandboxState.backupSandboxState inside
the try, check backup.success and if false log the failure (include
backup.message or details), do not set pendingStateRestore, and abort the
rebuild flow (for example return early or throw an error) so that
credentialRotation.changed does not trigger sandbox destruction; reference the
symbols credentialRotation.changed, sandboxState.backupSandboxState,
pendingStateRestore, existingSandboxState, and sandboxName when making the
change.

In `@test/e2e/test-token-rotation.sh`:
- Around line 156-163: The exported env vars in the script (OPENAI_BASE_URL,
NEMOCLAW_INFERENCE_PROVIDER, NEMOCLAW_INFERENCE_MODEL) don't match the
non-interactive env var names consumed by the onboarding logic in
src/lib/onboard.ts, so onboarding can ignore the fake endpoint; update the
test/e2e/test-token-rotation.sh exports to set the exact non-interactive env var
names that onboard reads (replace the three ignored vars with the corresponding
env var names used in src/lib/onboard.ts) so the fake OPENAI endpoint and
provider/model are actually bound during onboarding, and remove the stale/unused
exports.
- Around line 190-223: The probe writes the marker to /tmp which is outside the
agent-backed stateDirs handled by backupSandboxState()/restoreSandboxState(),
and the test currently only logs info if the marker is missing; change the test
to write the marker into one of the backed-up state dirs (use the agent's
writable/state directory used by restoreSandboxState(), e.g. the same dir
referenced by stateDirs) instead of /tmp and then assert failure if the marker
is absent after rotation (replace the info branch that logs missing marker with
a fail call); update references to the marker variable (MARKER) and the
openshell sandbox exec calls that read the marker so they point at the backed-up
path rather than /tmp/rotation-test-marker.
- Around line 196-235: After detecting rotation in ONBOARD_OUTPUT and confirming
sandbox rebuild, add an active verification inside the sandbox using openshell
sandbox exec "$SANDBOX_NAME" to call the Telegram getMe endpoint with the old
token (token A) and the new token (token B): run getMe (or an equivalent HTTP
call) from within the sandbox to assert token A now fails and token B succeeds;
place these checks after the "Sandbox running after rotation" block and before
the Phase 3 re-onboard assertions, reusing NEMOCLAW_CMD, SANDBOX_NAME and the
existing openshell helper to locate where to run the in-sandbox commands and
perform pass/fail on their responses.

---

Outside diff comments:
In `@src/lib/registry.ts`:
- Around line 153-169: registerSandbox is not persisting
providerCredentialHashes from the incoming SandboxEntry so new sandboxes are
recorded without hashes and break detectMessagingCredentialRotation; update the
object assigned to data.sandboxes[entry.name] inside registerSandbox to include
providerCredentialHashes: entry.providerCredentialHashes || null (or undefined
as appropriate) so the stored entry retains the hashes passed by createSandbox;
ensure the property name matches SandboxEntry and any downstream checks that
read providerCredentialHashes from load().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: da46481e-75fe-4c94-b643-43bcde0d5782

📥 Commits

Reviewing files that changed from the base of the PR and between 121148c and 8114d31.

📒 Files selected for processing (4)
  • src/lib/onboard.ts
  • src/lib/registry.ts
  • test/credential-rotation.test.ts
  • test/e2e/test-token-rotation.sh

Comment thread src/lib/onboard.ts
Comment thread test/e2e/test-token-rotation.sh Outdated
Comment on lines +156 to +163
export TELEGRAM_BOT_TOKEN="$TELEGRAM_BOT_TOKEN_A"
export NEMOCLAW_SANDBOX_NAME="$SANDBOX_NAME"
export NEMOCLAW_POLICY_TIER="open"
export NEMOCLAW_NON_INTERACTIVE=1
export OPENAI_API_KEY="fake-key"
export OPENAI_BASE_URL="$FAKE_BASE_URL"
export NEMOCLAW_INFERENCE_PROVIDER="openai-compatible"
export NEMOCLAW_INFERENCE_MODEL="fake-model"
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the non-interactive env vars that onboard actually reads.

These assignments don't match the CLI contract in src/lib/onboard.ts: OPENAI_BASE_URL, NEMOCLAW_INFERENCE_PROVIDER, and NEMOCLAW_INFERENCE_MODEL are ignored here. As written, this script won't bind onboarding to the fake endpoint, so it can silently exercise the wrong provider path.

Suggested fix
-export OPENAI_API_KEY="fake-key"
-export OPENAI_BASE_URL="$FAKE_BASE_URL"
-export NEMOCLAW_INFERENCE_PROVIDER="openai-compatible"
-export NEMOCLAW_INFERENCE_MODEL="fake-model"
+export NEMOCLAW_PROVIDER="custom"
+export NEMOCLAW_MODEL="fake-model"
+export NEMOCLAW_ENDPOINT_URL="$FAKE_BASE_URL"
+export COMPATIBLE_API_KEY="fake-key"
+export NEMOCLAW_PREFERRED_API="openai-completions"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export TELEGRAM_BOT_TOKEN="$TELEGRAM_BOT_TOKEN_A"
export NEMOCLAW_SANDBOX_NAME="$SANDBOX_NAME"
export NEMOCLAW_POLICY_TIER="open"
export NEMOCLAW_NON_INTERACTIVE=1
export OPENAI_API_KEY="fake-key"
export OPENAI_BASE_URL="$FAKE_BASE_URL"
export NEMOCLAW_INFERENCE_PROVIDER="openai-compatible"
export NEMOCLAW_INFERENCE_MODEL="fake-model"
export TELEGRAM_BOT_TOKEN="$TELEGRAM_BOT_TOKEN_A"
export NEMOCLAW_SANDBOX_NAME="$SANDBOX_NAME"
export NEMOCLAW_POLICY_TIER="open"
export NEMOCLAW_NON_INTERACTIVE=1
export NEMOCLAW_PROVIDER="custom"
export NEMOCLAW_MODEL="fake-model"
export NEMOCLAW_ENDPOINT_URL="$FAKE_BASE_URL"
export COMPATIBLE_API_KEY="fake-key"
export NEMOCLAW_PREFERRED_API="openai-completions"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-token-rotation.sh` around lines 156 - 163, The exported env
vars in the script (OPENAI_BASE_URL, NEMOCLAW_INFERENCE_PROVIDER,
NEMOCLAW_INFERENCE_MODEL) don't match the non-interactive env var names consumed
by the onboarding logic in src/lib/onboard.ts, so onboarding can ignore the fake
endpoint; update the test/e2e/test-token-rotation.sh exports to set the exact
non-interactive env var names that onboard reads (replace the three ignored vars
with the corresponding env var names used in src/lib/onboard.ts) so the fake
OPENAI endpoint and provider/model are actually bound during onboarding, and
remove the stale/unused exports.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dead code path — the test now uses install.sh --non-interactive which handles inference setup internally. The old fake OpenAI env vars were removed in the rewrite. The test validates rotation detection and rebuild behavior, not inference routing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat test/e2e/test-token-rotation.sh

Length of output: 6619


@ericksoa, confirmed — the full file on the branch has been rewritten. The fake OpenAI server, start_fake_openai, and all the flagged env vars (OPENAI_API_KEY, OPENAI_BASE_URL, NEMOCLAW_INFERENCE_PROVIDER, NEMOCLAW_INFERENCE_MODEL) are gone entirely. Phase 0 now delegates to bash install.sh --non-interactive, and the test only validates rotation detection and rebuild behavior via nemoclaw onboard --non-interactive output. My original comment was based on an older snapshot in the diff and doesn't apply to the current code. Sorry for the noise!


🧠 Learnings used
Learnt from: ericksoa
Repo: NVIDIA/NemoClaw PR: 1849
File: src/nemoclaw.ts:2168-2176
Timestamp: 2026-04-16T17:11:07.305Z
Learning: In `src/nemoclaw.ts` (TypeScript, NemoClaw CLI), the `rotate-token` command's `--from-env` flag validation is intentionally delegated to `configRotateToken()` in `src/lib/sandbox-config.ts`. If `--from-env` is the last CLI argument, `tokenOpts.fromEnv` becomes `undefined`; `configRotateToken` then checks `process.env[opts.fromEnv]` and errors on empty/undefined. No guard at the CLI dispatch layer is needed or expected — do not flag the missing value-presence check for `--from-env` in the rotate-token flag parser.

Learnt from: ericksoa
Repo: NVIDIA/NemoClaw PR: 1849
File: test/shields-audit.test.ts:29-31
Timestamp: 2026-04-16T17:09:26.055Z
Learning: In `test/shields-audit.test.ts` (NemoClaw project), the local `appendAuditEntry()` reimplementation is intentional: (1) the test's purpose is to validate JSONL format and field structure only, (2) redaction is covered separately in `test/secret-redaction.test.ts`, and (3) the CJS/ESM boundary in `src/lib/shields-audit.ts` prevents calling the real logger directly from Vitest. Do not flag the local reimplementation as a test-coverage gap in future reviews.

Comment thread test/e2e/test-token-rotation.sh Outdated
Comment thread test/e2e/test-token-rotation.sh Outdated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Brev E2E (messaging-providers): FAILED on branch fix/credential-rotation-propagationSee logs

@wscurran wscurran added Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. fix labels Apr 16, 2026
Wires test/e2e/test-token-rotation.sh into the nightly-e2e workflow
with fake Telegram tokens. Validates that rotating a messaging credential
and re-running onboard triggers sandbox rebuild and propagates the new
token.

Ref: #1903

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 587-588: The notify-on-failure aggregation is missing the
shields-config-e2e job: add "shields-config-e2e" to the needs: [...] list and
include "needs.shields-config-e2e.result == 'failure' ||" into the long if:
conditional so any failure of shields-config-e2e will trigger the
notify-on-failure job; update the existing if expression where other
needs.*.result checks appear to include this new check in the same OR chain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: faa12449-191f-4c3a-9f00-6fb3e047720a

📥 Commits

Reviewing files that changed from the base of the PR and between d2d8135 and c93c339.

📒 Files selected for processing (1)
  • .github/workflows/nightly-e2e.yaml

Comment on lines +587 to +588
needs: [cloud-e2e, cloud-experimental-e2e, messaging-providers-e2e, token-rotation-e2e, sandbox-survival-e2e, hermes-e2e, skip-permissions-e2e, sandbox-operations-e2e, inference-routing-e2e, snapshot-commands-e2e, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, rebuild-hermes-e2e, gpu-e2e]
if: ${{ always() && (needs.cloud-e2e.result == 'failure' || needs.cloud-experimental-e2e.result == 'failure' || needs.messaging-providers-e2e.result == 'failure' || needs.token-rotation-e2e.result == 'failure' || needs.sandbox-survival-e2e.result == 'failure' || needs.hermes-e2e.result == 'failure' || needs.skip-permissions-e2e.result == 'failure' || needs.sandbox-operations-e2e.result == 'failure' || needs.inference-routing-e2e.result == 'failure' || needs.snapshot-commands-e2e.result == 'failure' || needs.rebuild-openclaw-e2e.result == 'failure' || needs.upgrade-stale-sandbox-e2e.result == 'failure' || needs.rebuild-hermes-e2e.result == 'failure' || needs.gpu-e2e.result == 'failure') }}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include shields-config-e2e in failure aggregation.

Line 587 and Line 588 still omit shields-config-e2e, so a failure there won’t trigger notify-on-failure, despite the workflow comment saying notification should happen when any E2E job fails.

🔧 Suggested patch
-    needs: [cloud-e2e, cloud-experimental-e2e, messaging-providers-e2e, token-rotation-e2e, sandbox-survival-e2e, hermes-e2e, skip-permissions-e2e, sandbox-operations-e2e, inference-routing-e2e, snapshot-commands-e2e, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, rebuild-hermes-e2e, gpu-e2e]
-    if: ${{ always() && (needs.cloud-e2e.result == 'failure' || needs.cloud-experimental-e2e.result == 'failure' || needs.messaging-providers-e2e.result == 'failure' || needs.token-rotation-e2e.result == 'failure' || needs.sandbox-survival-e2e.result == 'failure' || needs.hermes-e2e.result == 'failure' || needs.skip-permissions-e2e.result == 'failure' || needs.sandbox-operations-e2e.result == 'failure' || needs.inference-routing-e2e.result == 'failure' || needs.snapshot-commands-e2e.result == 'failure' || needs.rebuild-openclaw-e2e.result == 'failure' || needs.upgrade-stale-sandbox-e2e.result == 'failure' || needs.rebuild-hermes-e2e.result == 'failure' || needs.gpu-e2e.result == 'failure') }}
+    needs: [cloud-e2e, cloud-experimental-e2e, messaging-providers-e2e, token-rotation-e2e, sandbox-survival-e2e, hermes-e2e, skip-permissions-e2e, sandbox-operations-e2e, inference-routing-e2e, snapshot-commands-e2e, shields-config-e2e, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, rebuild-hermes-e2e, gpu-e2e]
+    if: ${{ always() && (needs.cloud-e2e.result == 'failure' || needs.cloud-experimental-e2e.result == 'failure' || needs.messaging-providers-e2e.result == 'failure' || needs.token-rotation-e2e.result == 'failure' || needs.sandbox-survival-e2e.result == 'failure' || needs.hermes-e2e.result == 'failure' || needs.skip-permissions-e2e.result == 'failure' || needs.sandbox-operations-e2e.result == 'failure' || needs.inference-routing-e2e.result == 'failure' || needs.snapshot-commands-e2e.result == 'failure' || needs.shields-config-e2e.result == 'failure' || needs.rebuild-openclaw-e2e.result == 'failure' || needs.upgrade-stale-sandbox-e2e.result == 'failure' || needs.rebuild-hermes-e2e.result == 'failure' || needs.gpu-e2e.result == 'failure') }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/nightly-e2e.yaml around lines 587 - 588, The
notify-on-failure aggregation is missing the shields-config-e2e job: add
"shields-config-e2e" to the needs: [...] list and include
"needs.shields-config-e2e.result == 'failure' ||" into the long if: conditional
so any failure of shields-config-e2e will trigger the notify-on-failure job;
update the existing if expression where other needs.*.result checks appear to
include this new check in the same OR chain.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — shields-config-e2e was added in a separate PR and isn't part of this change, but I'll include it in the notify-on-failure aggregation since we're already touching that line.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericksoa, sounds good! The patch in the review comment has the exact lines ready — just drop shields-config-e2e into the needs: list and add needs.shields-config-e2e.result == 'failure' || to the if: expression at the same position (after snapshot-commands-e2e), and you'll be all set.


🧠 Learnings used
Learnt from: ericksoa
Repo: NVIDIA/NemoClaw PR: 1967
File: test/e2e/test-token-rotation.sh:0-0
Timestamp: 2026-04-16T18:34:27.427Z
Learning: In `test/e2e/test-token-rotation.sh` (NemoClaw project), the E2E test intentionally focuses only on messaging credential rotation detection and sandbox rebuild triggering. Workspace/state preservation across rotation is not tested here; it is covered by the `sandboxRebuild` unit tests and `test/e2e/test-sandbox-rebuild.sh`. Do not flag the absence of a workspace-preservation assertion in this script as a test gap.

Learnt from: ericksoa
Repo: NVIDIA/NemoClaw PR: 1849
File: src/lib/sandbox-config.ts:354-361
Timestamp: 2026-04-16T17:09:43.616Z
Learning: In `src/lib/sandbox-config.ts` (TypeScript, NemoClaw CLI), the `appendAuditEntry` call inside `configSet` intentionally uses `action: "shields_down"` rather than a dedicated `"config_set"` action. This is by design: `config set` is only permitted while shields are down, so the audit entry records the shields-down authorization context rather than the config mutation itself. Do not flag this as a wrong action type in future reviews. A dedicated `config_set` audit action type is deferred to a follow-up.

Learnt from: ericksoa
Repo: NVIDIA/NemoClaw PR: 1849
File: test/shields-audit.test.ts:29-31
Timestamp: 2026-04-16T17:09:26.055Z
Learning: In `test/shields-audit.test.ts` (NemoClaw project), the local `appendAuditEntry()` reimplementation is intentional: (1) the test's purpose is to validate JSONL format and field structure only, (2) redaction is covered separately in `test/secret-redaction.test.ts`, and (3) the CJS/ESM boundary in `src/lib/shields-audit.ts` prevents calling the real logger directly from Vitest. Do not flag the local reimplementation as a test-coverage gap in future reviews.

Learnt from: jayavenkatesh19
Repo: NVIDIA/NemoClaw PR: 839
File: .github/workflows/nightly-e2e.yaml:97-101
Timestamp: 2026-03-25T17:41:58.209Z
Learning: In this repo’s GitHub Actions workflows, it is intentional and correct to set `NEMOCLAW_RECREATE_SANDBOX=1` in a job `env:` block when the job later runs `install.sh` in `--non-interactive` mode. `install.sh --non-interactive` orchestrates `nemoclaw onboard --non-interactive`, which reads `NEMOCLAW_RECREATE_SANDBOX` from the environment; the variable is documented in `install.sh` usage/help output, so it should not be flagged as an unknown or unused env var in workflows that call `install.sh`.

registerSandbox() explicitly constructs the stored object from known
fields rather than spreading the entry. The providerCredentialHashes
field was added to the SandboxEntry interface but not to the object
literal in registerSandbox(), so hashes were silently dropped.

Also fixes the e2e test to use install.sh for proper NemoClaw
bootstrapping on bare CI runners, and fixes shellcheck warning.

Verified locally: token A onboard stores hashes, token B onboard
detects rotation and rebuilds, token B re-onboard reuses sandbox.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
test/e2e/test-token-rotation.sh (2)

202-214: ⚠️ Potential issue | 🟠 Major

Workspace preservation is not asserted after rebuild.

The PR objective includes preserving workspace state across rebuild, but this script has no marker write/read assertion around the rotation-triggered rebuild path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-token-rotation.sh` around lines 202 - 214, The test lacks an
assertion that workspace state is preserved across the rebuild in "Phase 3:
Re-onboard with same token", so add a marker file write/read check around the
onboarding/re-onboarding path: before the first onboard create a unique marker
file (e.g., write a GUID to /tmp or the sandbox workspace) and commit its
expected value; after the re-onboard step in the block that checks
ONBOARD_OUTPUT (both the "reusing it" branch and the else branch where rebuild
may have occurred), read the marker and assert it matches the original value
(use the same helper functions pass/fail/info), failing the test if the marker
is missing or altered; reference the variables/functions ONBOARD_OUTPUT, pass,
fail, info and the "Phase 3" section to locate where to add the marker write
before initial onboarding and the marker read/assert after the re-onboard.

10-12: ⚠️ Potential issue | 🟠 Major

This E2E still misses the core in-sandbox Telegram verification for rotation.

The script explicitly avoids Telegram API validation (Lines 10-12), and Phase 2 only checks onboarding logs and sandbox presence. That does not prove the running sandbox’s L7 proxy uses token A before rotation and token B after rotation (issue #1903 objective).

Also applies to: 170-201

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-token-rotation.sh` around lines 10 - 12, The E2E script
test-token-rotation.sh omits an in-sandbox verification that the sandbox L7
proxy actually uses token A before rotation and token B after rotation; add a
Phase 2 check that runs a request from inside the running sandbox (e.g.,
exec/curl into the sandbox process or run the test helper that hits the Telegram
API path /bot<TOKEN>/getMe) and assert the fake Telegram server received
requests authenticated with TOKEN_A pre-rotation and TOKEN_B post-rotation; use
the existing TOKEN_A/TOKEN_B variables and the test’s sandbox identifier to
locate the running container/process and record the token observed by the fake
Telegram endpoint, failing the test if the observed token does not match
expected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-token-rotation.sh`:
- Around line 161-166: The current test checks for "providerCredentialHashes"
anywhere in $REGISTRY which can false-pass; update the assertion to verify the
credential hash is stored specifically under the current sandbox key
($SANDBOX_NAME). Locate the registry check that uses REGISTRY and replace the
broad grep with a sandbox-scoped check (e.g., parse the registry JSON and assert
that the object for SANDBOX_NAME contains a providerCredentialHashes field or
grep for a pattern that ties providerCredentialHashes to the SANDBOX_NAME key)
so the test only passes if providerCredentialHashes exists for "$SANDBOX_NAME".
- Around line 175-179: The script captures ONBOARD_OUTPUT and onboard_exit but
never fails the test when onboarding returns non-zero; update the block that
sets ONBOARD_OUTPUT and onboard_exit (and the similar Phase 3 capture) to check
if onboard_exit != 0 and, if so, print the captured ONBOARD_OUTPUT for debugging
and exit the test with the same non-zero code (or call the test-failure helper
used elsewhere), ensuring a non-zero onboarding result fails the e2e test
immediately.

---

Duplicate comments:
In `@test/e2e/test-token-rotation.sh`:
- Around line 202-214: The test lacks an assertion that workspace state is
preserved across the rebuild in "Phase 3: Re-onboard with same token", so add a
marker file write/read check around the onboarding/re-onboarding path: before
the first onboard create a unique marker file (e.g., write a GUID to /tmp or the
sandbox workspace) and commit its expected value; after the re-onboard step in
the block that checks ONBOARD_OUTPUT (both the "reusing it" branch and the else
branch where rebuild may have occurred), read the marker and assert it matches
the original value (use the same helper functions pass/fail/info), failing the
test if the marker is missing or altered; reference the variables/functions
ONBOARD_OUTPUT, pass, fail, info and the "Phase 3" section to locate where to
add the marker write before initial onboarding and the marker read/assert after
the re-onboard.
- Around line 10-12: The E2E script test-token-rotation.sh omits an in-sandbox
verification that the sandbox L7 proxy actually uses token A before rotation and
token B after rotation; add a Phase 2 check that runs a request from inside the
running sandbox (e.g., exec/curl into the sandbox process or run the test helper
that hits the Telegram API path /bot<TOKEN>/getMe) and assert the fake Telegram
server received requests authenticated with TOKEN_A pre-rotation and TOKEN_B
post-rotation; use the existing TOKEN_A/TOKEN_B variables and the test’s sandbox
identifier to locate the running container/process and record the token observed
by the fake Telegram endpoint, failing the test if the observed token does not
match expected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 383aa178-b277-429a-a854-b284f03a972e

📥 Commits

Reviewing files that changed from the base of the PR and between c93c339 and 5bf0af0.

📒 Files selected for processing (2)
  • src/lib/registry.ts
  • test/e2e/test-token-rotation.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/registry.ts

Comment thread test/e2e/test-token-rotation.sh Outdated
Comment thread test/e2e/test-token-rotation.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
test/e2e/test-token-rotation.sh (3)

175-179: ⚠️ Potential issue | 🟠 Major

Fail immediately when nemoclaw onboard returns non-zero.

Line 176 captures onboard_exit but never enforces it; Phase 3 has the same gap at Line 206. A failed onboard can still produce partial output and trigger misleading pass conditions.

Suggested fix
 ONBOARD_OUTPUT=$(nemoclaw onboard --non-interactive 2>&1)
 onboard_exit=$?
 
 info "onboard exit code: $onboard_exit"
+if [ "$onboard_exit" -ne 0 ]; then
+  fail "Phase 2 onboard failed (exit $onboard_exit)"
+  echo "$ONBOARD_OUTPUT" | tail -40
+  exit "$onboard_exit"
+fi
@@
 ONBOARD_OUTPUT=$(nemoclaw onboard --non-interactive 2>&1)
+onboard_exit=$?
+if [ "$onboard_exit" -ne 0 ]; then
+  fail "Phase 3 onboard failed (exit $onboard_exit)"
+  echo "$ONBOARD_OUTPUT" | tail -40
+  exit "$onboard_exit"
+fi

Also applies to: 206-207

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-token-rotation.sh` around lines 175 - 179, The script captures
the output and exit code of `nemoclaw onboard` into `ONBOARD_OUTPUT` and
`onboard_exit` but does not act on non-zero exits; update the blocks that run
`nemoclaw onboard` (the lines assigning `ONBOARD_OUTPUT`/`onboard_exit` and the
Phase 3 equivalent) to immediately fail the test when `onboard_exit` is non-zero
by printing the captured `ONBOARD_OUTPUT` to stderr and calling `exit
$onboard_exit` (or `fail`/appropriate test-failure helper) so a failed onboard
cannot produce a false-positive pass.

161-166: ⚠️ Potential issue | 🟡 Minor

Scope the registry-hash assertion to "$SANDBOX_NAME" only.

Line 162 currently matches providerCredentialHashes anywhere in $REGISTRY, which can false-pass due to unrelated entries. Assert the field on the current sandbox record specifically.

Suggested fix
-# Verify credential hashes are stored in registry
-if [ -f "$REGISTRY" ] && grep -q "providerCredentialHashes" "$REGISTRY"; then
+# Verify credential hashes are stored for this sandbox entry
+if [ -f "$REGISTRY" ] && node -e '
+const fs = require("fs");
+const [registryPath, sandboxName] = process.argv.slice(1);
+const data = JSON.parse(fs.readFileSync(registryPath, "utf8"));
+const entry =
+  (data && data[sandboxName]) ||
+  (data && data.sandboxes && data.sandboxes[sandboxName]) ||
+  (Array.isArray(data) ? data.find((x) => x && x.name === sandboxName) : null);
+process.exit(entry && entry.providerCredentialHashes ? 0 : 1);
+' "$REGISTRY" "$SANDBOX_NAME"; then
   pass "Credential hashes stored in registry"
 else
   fail "Credential hashes not found in registry"
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-token-rotation.sh` around lines 161 - 166, The current check
searches for "providerCredentialHashes" anywhere in $REGISTRY which can
false-pass; change it to assert the field exists for the current sandbox only by
scoping to "$SANDBOX_NAME". Update the test so it queries the registry entry for
SANDBOX_NAME (e.g., using jq to test
.sandboxes["$SANDBOX_NAME"].providerCredentialHashes or by grepping the
SANDBOX_NAME block and then checking for providerCredentialHashes) and keep the
pass/fail logic around that scoped check (reference: the variables REGISTRY and
SANDBOX_NAME and the key providerCredentialHashes).

10-12: ⚠️ Potential issue | 🟠 Major

Add in-sandbox Telegram getMe assertions to validate the real regression path.

Lines 10-11 explicitly skip Telegram-response validation, but the issue objective is about the running sandbox L7 proxy using the rotated token. Log greps alone do not prove that behavior. Add active in-sandbox checks: token A should fail (4xx) before/after rotation as applicable, and token B should return 200 after rotation.

Also applies to: 196-214

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-token-rotation.sh` around lines 10 - 12, Update the test script
to perform active in-sandbox Telegram getMe calls instead of only grepping logs:
call the sandbox L7 proxy (use the script's existing sandbox invocation helper
or curl against the sandbox endpoint) with TOKEN_A and assert it returns a 4xx
before rotation and still fails after rotation when appropriate, then call with
TOKEN_B after rotation and assert it returns HTTP 200; reference and modify the
token variables (e.g., TOKEN_A, TOKEN_B) and the sandbox invocation helper
(e.g., the existing curl/sandbox runner used elsewhere in
test-token-rotation.sh) to add these explicit assertions around the rotation
step so the test verifies the actual proxy behavior rather than only log output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/test-token-rotation.sh`:
- Around line 175-179: The script captures the output and exit code of `nemoclaw
onboard` into `ONBOARD_OUTPUT` and `onboard_exit` but does not act on non-zero
exits; update the blocks that run `nemoclaw onboard` (the lines assigning
`ONBOARD_OUTPUT`/`onboard_exit` and the Phase 3 equivalent) to immediately fail
the test when `onboard_exit` is non-zero by printing the captured
`ONBOARD_OUTPUT` to stderr and calling `exit $onboard_exit` (or
`fail`/appropriate test-failure helper) so a failed onboard cannot produce a
false-positive pass.
- Around line 161-166: The current check searches for "providerCredentialHashes"
anywhere in $REGISTRY which can false-pass; change it to assert the field exists
for the current sandbox only by scoping to "$SANDBOX_NAME". Update the test so
it queries the registry entry for SANDBOX_NAME (e.g., using jq to test
.sandboxes["$SANDBOX_NAME"].providerCredentialHashes or by grepping the
SANDBOX_NAME block and then checking for providerCredentialHashes) and keep the
pass/fail logic around that scoped check (reference: the variables REGISTRY and
SANDBOX_NAME and the key providerCredentialHashes).
- Around line 10-12: Update the test script to perform active in-sandbox
Telegram getMe calls instead of only grepping logs: call the sandbox L7 proxy
(use the script's existing sandbox invocation helper or curl against the sandbox
endpoint) with TOKEN_A and assert it returns a 4xx before rotation and still
fails after rotation when appropriate, then call with TOKEN_B after rotation and
assert it returns HTTP 200; reference and modify the token variables (e.g.,
TOKEN_A, TOKEN_B) and the sandbox invocation helper (e.g., the existing
curl/sandbox runner used elsewhere in test-token-rotation.sh) to add these
explicit assertions around the rotation step so the test verifies the actual
proxy behavior rather than only log output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 57afdba5-2a54-4576-b495-92ccd32c4433

📥 Commits

Reviewing files that changed from the base of the PR and between 5bf0af0 and a09848f.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-token-rotation.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/nightly-e2e.yaml

ericksoa and others added 3 commits April 16, 2026 11:32
- Abort credential rotation rebuild when state backup fails (returns
  success: false or throws), falling back to reuse with a warning
- Tighten registry hash assertion in e2e test to verify the hash is
  stored under the specific sandbox name, not just anywhere in the file
- Enforce onboard exit codes in phases 2 and 3 of the e2e test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Validate bot token rotation

2 participants