fix(e2e): stabilize nightly — PONG retry + landlock PATH by cjagwani · Pull Request #1972 · NVIDIA/NemoClaw

cjagwani · 2026-04-16T15:28:19Z

Summary

The nightly E2E has been failing for 10+ consecutive runs. Two independent fixes:

PONG inference retry — sandbox inference assertions now retry 3x with 5s backoff. Live models through the gateway proxy are not deterministic; a single attempt was too brittle for CI.
Landlock PATH fix — 04-landlock-readonly.sh exited 127 because openshell was not on PATH in child bash processes. Exports PATH before Phase 5 check scripts and adds || true to sandbox_exec calls for expected non-zero exits.

Changes

test/e2e/test-full-e2e.sh — 3-attempt retry for sandbox inference PONG
test/e2e/test-sandbox-survival.sh — 3-attempt retry for baseline + post-restart PONG
test/e2e/test-e2e-cloud-experimental.sh — export PATH before Phase 5 checks
test/e2e/e2e-cloud-experimental/checks/04-landlock-readonly.sh — || true on sandbox_exec calls

Testing

All pre-commit hooks pass (shellcheck, shfmt, tests)
npm test — 1303 tests pass, no regressions
Retry logic simulated locally (fails-then-succeeds, all-fail cases verified)
Nightly run on main — the real test (can only verify in CI environment)

Fixes #1969

Signed-off-by: Charan Jagwani cjagwani@nvidia.com

Summary by CodeRabbit

Tests
- Made sandbox checks tolerant of command failures to avoid premature termination while preserving existing pass/fail detection.
- Updated test runner PATH so check scripts can locate newly installed tooling during test runs.
- Added up-to-3 retry attempts with short delays, regenerated per-attempt configs, and improved diagnostics (truncated response previews and clearer info logs) for sandbox inference and survival checks.

Two independent fixes for the nightly E2E which has been failing for 10+ consecutive runs: 1. Sandbox inference PONG assertions now retry 3 times with 5s backoff. Live models through the gateway proxy are not deterministic — a single attempt is too brittle for CI. Affects test-full-e2e.sh and test-sandbox-survival.sh (baseline + post-restart checks). 2. Landlock checks (04-landlock-readonly.sh) exited 127 because openshell was not on PATH in child bash processes spawned by Phase 5. Fix: export PATH including /usr/local/bin before running check scripts, and add || true to sandbox_exec calls so set -e does not abort on expected non-zero exits from Landlock-blocked operations. Fixes #1969

coderabbitai · 2026-04-16T15:28:35Z

📝 Walkthrough

Walkthrough

Tolerance for sandbox command failures was added, PATH is exported for child check scripts, and sandbox inference assertions now retry up to 3 times with 5s backoff to reduce flaky E2E failures.

Changes

Cohort / File(s)	Summary
Landlock check tolerance `test/e2e/e2e-cloud-experimental/checks/04-landlock-readonly.sh`	Appends `
PATH for child checks `test/e2e/test-e2e-cloud-experimental.sh`	Prepends `/usr/local/bin` and `${HOME}/.local/bin` to `PATH` before Phase 5 so child `bash` processes can find tools (e.g., `openshell`) installed earlier.
Sandbox inference retries `test/e2e/test-full-e2e.sh`, `test/e2e/test-sandbox-survival.sh`	Replace single-request assertions with up-to-3 attempt retry loops (5s backoff) that re-fetch and parse sandbox responses; tests pass if a `PONG` is observed on any attempt, otherwise they fail with a truncated diagnostic of the last response.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Test Script
  participant Shell as openshell / sandbox
  participant Gateway as inference.local
  Test->>Shell: request ssh-config / exec sandbox command
  Shell-->>Test: ssh-config / command output (may fail)
  loop up to 3 attempts
    Test->>Gateway: request inference via sandbox proxy (curl)
    Gateway-->>Test: response (may be non-deterministic)
    alt response contains PONG (case-insensitive)
      Test->>Test: mark success, stop retries
    else
      Test->>Test: log short preview, sleep 5s, retry
      Test->>Shell: (recreate ssh config / re-init) before next fetch
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Three hops, three tries under starlit beams,
PATH extended, sandbox calms its schemes,
PONG may wander, then bounce back bright,
Logs hum softly through the quiet night,
Tests nap content in the morning light.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes both main changes: PONG retry logic and landlock PATH fixes, matching the primary modifications across four shell scripts.
Linked Issues check	✅ Passed	All coding requirements from issue `#1969` are met: 3-attempt PONG retry with 5s backoff in test-full-e2e.sh and test-sandbox-survival.sh; PATH export in test-e2e-cloud-experimental.sh and \|\| true additions in landlock check script.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the two identified test-reliability fixes; no unrelated modifications to production code or other systems detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/nightly-e2e-stability

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/e2e/test-e2e-cloud-experimental.sh (1)

518-521: Honor XDG_BIN_HOME in the Phase 5 PATH fix.

The installer can place openshell in ${XDG_BIN_HOME:-$HOME/.local/bin}. Hard-coding only $HOME/.local/bin here means child checks/*.sh can still hit exit 127 when XDG_BIN_HOME is customized.

♻️ Suggested tweak

-  export PATH="/usr/local/bin:${HOME}/.local/bin:${PATH}"
+  export PATH="/usr/local/bin:${XDG_BIN_HOME:-${HOME}/.local/bin}:${PATH}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-e2e-cloud-experimental.sh` around lines 518 - 521, The PATH
export currently hard-codes ${HOME}/.local/bin; update the Phase 5 PATH fix so
it honors XDG_BIN_HOME by using the XDG_BIN_HOME fallback pattern (e.g.,
${XDG_BIN_HOME:-${HOME}/.local/bin}) when constructing PATH; modify the export
PATH line referenced in test/e2e/test-e2e-cloud-experimental.sh so it adds
/usr/local/bin and the XDG-aware local bin directory before ${PATH} to ensure
child scripts (checks/*.sh) can find openshell when XDG_BIN_HOME is set.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-full-e2e.sh`:
- Around line 312-347: The loop can reference unset variables under set -u;
initialize TIMEOUT_CMD and sandbox_content before the retry loop to prevent
crashes. Specifically, before the for-loop that sets pong_ok, set TIMEOUT_CMD to
its intended default (the same value used elsewhere in the script) and set
sandbox_content="" so later references (lines using $TIMEOUT_CMD,
sandbox_content, and grep on sandbox_content) are safe; ensure the retry block
still overwrites sandbox_response/sandbox_content when a fetch succeeds and keep
the existing openshell sandbox ssh-config usage intact.

---

Nitpick comments:
In `@test/e2e/test-e2e-cloud-experimental.sh`:
- Around line 518-521: The PATH export currently hard-codes ${HOME}/.local/bin;
update the Phase 5 PATH fix so it honors XDG_BIN_HOME by using the XDG_BIN_HOME
fallback pattern (e.g., ${XDG_BIN_HOME:-${HOME}/.local/bin}) when constructing
PATH; modify the export PATH line referenced in
test/e2e/test-e2e-cloud-experimental.sh so it adds /usr/local/bin and the
XDG-aware local bin directory before ${PATH} to ensure child scripts
(checks/*.sh) can find openshell when XDG_BIN_HOME is set.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 2ebc836c-5999-4175-a7fe-bc430f4272d6

📥 Commits

Reviewing files that changed from the base of the PR and between 121148c and 825ed3b.

📒 Files selected for processing (4)

test/e2e/e2e-cloud-experimental/checks/04-landlock-readonly.sh
test/e2e/test-e2e-cloud-experimental.sh
test/e2e/test-full-e2e.sh
test/e2e/test-sandbox-survival.sh

Ref: #1969

…rash CodeRabbit caught that TIMEOUT_CMD and sandbox_content/baseline_content/ post_content could be unbound under set -u (nounset) if the initial SSH config call fails or all responses are empty. Ref: #1969

coderabbitai

🧹 Nitpick comments (1)

test/e2e/test-sandbox-survival.sh (1)

327-345: Consider extracting the duplicated PONG retry block into a helper.

The two retry loops are nearly identical; pulling them into one function will reduce drift risk (for example, logging/detail differences) and simplify future tuning of attempts/backoff.

♻️ Optional refactor sketch

+retry_for_pong() {
+  local phase_label="$1"
+  local response="$2"
+  local content=""
+  local ok=false
+  for pong_attempt in 1 2 3; do
+    content=""
+    if [ -n "$response" ]; then
+      content=$(echo "$response" | parse_chat_content 2>/dev/null) || true
+    fi
+    if grep -qi "PONG" <<<"$content"; then
+      ok=true
+      break
+    fi
+    info "${phase_label} attempt ${pong_attempt}/3: got '${content:0:80}', retrying in 5s..."
+    [ "$pong_attempt" -lt 3 ] || break
+    sleep 5
+    response=$($TIMEOUT_CMD ssh "${SSH_OPTS[@]}" "$SSH_TARGET" "curl -s --max-time 60 https://inference.local/v1/chat/completions ..." 2>&1) || true
+  done
+  REPLY_OK="$ok"
+  REPLY_CONTENT="$content"
+  REPLY_RESPONSE="$response"
+}

Also applies to: 657-675

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-sandbox-survival.sh` around lines 327 - 345, Extract the
duplicated retry loop that checks for a "PONG" response into a single helper
shell function (e.g., wait_for_pong or retry_pong) that accepts/uses the same
variables referenced in the diff (SSH_TARGET, SSH_OPTS, TIMEOUT_CMD, MODEL,
parse_chat_content) and returns/sets the response or a success flag; move the
curl + parse_chat_content + grep + logging + backoff logic into that function
and replace both retry loops with calls to it to avoid duplication and keep
behavior (3 attempts, 5s sleep, logging of first 80 chars) identical.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 327-345: Extract the duplicated retry loop that checks for a
"PONG" response into a single helper shell function (e.g., wait_for_pong or
retry_pong) that accepts/uses the same variables referenced in the diff
(SSH_TARGET, SSH_OPTS, TIMEOUT_CMD, MODEL, parse_chat_content) and returns/sets
the response or a success flag; move the curl + parse_chat_content + grep +
logging + backoff logic into that function and replace both retry loops with
calls to it to avoid duplication and keep behavior (3 attempts, 5s sleep,
logging of first 80 chars) identical.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: afb8cdd1-1b9d-441c-9be6-ac7de22b13b0

📥 Commits

Reviewing files that changed from the base of the PR and between 0bb07f7 and 0908fab.

📒 Files selected for processing (2)

test/e2e/test-full-e2e.sh
test/e2e/test-sandbox-survival.sh

🚧 Files skipped from review as they are similar to previous changes (1)

test/e2e/test-full-e2e.sh

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread test/e2e/test-full-e2e.sh

cjagwani added 2 commits April 16, 2026 09:18

fix(e2e): add verbose curl on PONG retry for proxy diagnosis

0bb07f7

Ref: #1969

fix(e2e): initialize variables before retry loops to prevent set -u c…

0908fab

…rash CodeRabbit caught that TIMEOUT_CMD and sandbox_content/baseline_content/ post_content could be unbound under set -u (nounset) if the initial SSH config call fails or all responses are empty. Ref: #1969

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2e): stabilize nightly — PONG retry + landlock PATH#1972

fix(e2e): stabilize nightly — PONG retry + landlock PATH#1972
cjagwani wants to merge 3 commits intomainfrom
fix/nightly-e2e-stability

cjagwani commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cjagwani commented Apr 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cjagwani commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 16, 2026 •

edited

Loading