Skip to content

fix(e2e): stabilize nightly — PONG retry + landlock PATH#1972

Open
cjagwani wants to merge 3 commits intomainfrom
fix/nightly-e2e-stability
Open

fix(e2e): stabilize nightly — PONG retry + landlock PATH#1972
cjagwani wants to merge 3 commits intomainfrom
fix/nightly-e2e-stability

Conversation

@cjagwani
Copy link
Copy Markdown
Contributor

@cjagwani cjagwani commented Apr 16, 2026

Summary

The nightly E2E has been failing for 10+ consecutive runs. Two independent fixes:

  1. PONG inference retry — sandbox inference assertions now retry 3x with 5s backoff. Live models through the gateway proxy are not deterministic; a single attempt was too brittle for CI.

  2. Landlock PATH fix04-landlock-readonly.sh exited 127 because openshell was not on PATH in child bash processes. Exports PATH before Phase 5 check scripts and adds || true to sandbox_exec calls for expected non-zero exits.

Changes

  • test/e2e/test-full-e2e.sh — 3-attempt retry for sandbox inference PONG
  • test/e2e/test-sandbox-survival.sh — 3-attempt retry for baseline + post-restart PONG
  • test/e2e/test-e2e-cloud-experimental.sh — export PATH before Phase 5 checks
  • test/e2e/e2e-cloud-experimental/checks/04-landlock-readonly.sh|| true on sandbox_exec calls

Testing

  • All pre-commit hooks pass (shellcheck, shfmt, tests)
  • npm test — 1303 tests pass, no regressions
  • Retry logic simulated locally (fails-then-succeeds, all-fail cases verified)
  • Nightly run on main — the real test (can only verify in CI environment)

Fixes #1969

Signed-off-by: Charan Jagwani cjagwani@nvidia.com

Summary by CodeRabbit

  • Tests
    • Made sandbox checks tolerant of command failures to avoid premature termination while preserving existing pass/fail detection.
    • Updated test runner PATH so check scripts can locate newly installed tooling during test runs.
    • Added up-to-3 retry attempts with short delays, regenerated per-attempt configs, and improved diagnostics (truncated response previews and clearer info logs) for sandbox inference and survival checks.

Two independent fixes for the nightly E2E which has been failing for 10+
consecutive runs:

1. Sandbox inference PONG assertions now retry 3 times with 5s backoff.
   Live models through the gateway proxy are not deterministic — a single
   attempt is too brittle for CI. Affects test-full-e2e.sh and
   test-sandbox-survival.sh (baseline + post-restart checks).

2. Landlock checks (04-landlock-readonly.sh) exited 127 because openshell
   was not on PATH in child bash processes spawned by Phase 5. Fix: export
   PATH including /usr/local/bin before running check scripts, and add
   || true to sandbox_exec calls so set -e does not abort on expected
   non-zero exits from Landlock-blocked operations.

Fixes #1969
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

Tolerance for sandbox command failures was added, PATH is exported for child check scripts, and sandbox inference assertions now retry up to 3 times with 5s backoff to reduce flaky E2E failures.

Changes

Cohort / File(s) Summary
Landlock check tolerance
test/e2e/e2e-cloud-experimental/checks/04-landlock-readonly.sh
Appends `
PATH for child checks
test/e2e/test-e2e-cloud-experimental.sh
Prepends /usr/local/bin and ${HOME}/.local/bin to PATH before Phase 5 so child bash processes can find tools (e.g., openshell) installed earlier.
Sandbox inference retries
test/e2e/test-full-e2e.sh, test/e2e/test-sandbox-survival.sh
Replace single-request assertions with up-to-3 attempt retry loops (5s backoff) that re-fetch and parse sandbox responses; tests pass if a PONG is observed on any attempt, otherwise they fail with a truncated diagnostic of the last response.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Test Script
  participant Shell as openshell / sandbox
  participant Gateway as inference.local
  Test->>Shell: request ssh-config / exec sandbox command
  Shell-->>Test: ssh-config / command output (may fail)
  loop up to 3 attempts
    Test->>Gateway: request inference via sandbox proxy (curl)
    Gateway-->>Test: response (may be non-deterministic)
    alt response contains PONG (case-insensitive)
      Test->>Test: mark success, stop retries
    else
      Test->>Test: log short preview, sleep 5s, retry
      Test->>Shell: (recreate ssh config / re-init) before next fetch
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Three hops, three tries under starlit beams,
PATH extended, sandbox calms its schemes,
PONG may wander, then bounce back bright,
Logs hum softly through the quiet night,
Tests nap content in the morning light.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes both main changes: PONG retry logic and landlock PATH fixes, matching the primary modifications across four shell scripts.
Linked Issues check ✅ Passed All coding requirements from issue #1969 are met: 3-attempt PONG retry with 5s backoff in test-full-e2e.sh and test-sandbox-survival.sh; PATH export in test-e2e-cloud-experimental.sh and || true additions in landlock check script.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the two identified test-reliability fixes; no unrelated modifications to production code or other systems detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/nightly-e2e-stability

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e/test-e2e-cloud-experimental.sh (1)

518-521: Honor XDG_BIN_HOME in the Phase 5 PATH fix.

The installer can place openshell in ${XDG_BIN_HOME:-$HOME/.local/bin}. Hard-coding only $HOME/.local/bin here means child checks/*.sh can still hit exit 127 when XDG_BIN_HOME is customized.

♻️ Suggested tweak
-  export PATH="/usr/local/bin:${HOME}/.local/bin:${PATH}"
+  export PATH="/usr/local/bin:${XDG_BIN_HOME:-${HOME}/.local/bin}:${PATH}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-e2e-cloud-experimental.sh` around lines 518 - 521, The PATH
export currently hard-codes ${HOME}/.local/bin; update the Phase 5 PATH fix so
it honors XDG_BIN_HOME by using the XDG_BIN_HOME fallback pattern (e.g.,
${XDG_BIN_HOME:-${HOME}/.local/bin}) when constructing PATH; modify the export
PATH line referenced in test/e2e/test-e2e-cloud-experimental.sh so it adds
/usr/local/bin and the XDG-aware local bin directory before ${PATH} to ensure
child scripts (checks/*.sh) can find openshell when XDG_BIN_HOME is set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-full-e2e.sh`:
- Around line 312-347: The loop can reference unset variables under set -u;
initialize TIMEOUT_CMD and sandbox_content before the retry loop to prevent
crashes. Specifically, before the for-loop that sets pong_ok, set TIMEOUT_CMD to
its intended default (the same value used elsewhere in the script) and set
sandbox_content="" so later references (lines using $TIMEOUT_CMD,
sandbox_content, and grep on sandbox_content) are safe; ensure the retry block
still overwrites sandbox_response/sandbox_content when a fetch succeeds and keep
the existing openshell sandbox ssh-config usage intact.

---

Nitpick comments:
In `@test/e2e/test-e2e-cloud-experimental.sh`:
- Around line 518-521: The PATH export currently hard-codes ${HOME}/.local/bin;
update the Phase 5 PATH fix so it honors XDG_BIN_HOME by using the XDG_BIN_HOME
fallback pattern (e.g., ${XDG_BIN_HOME:-${HOME}/.local/bin}) when constructing
PATH; modify the export PATH line referenced in
test/e2e/test-e2e-cloud-experimental.sh so it adds /usr/local/bin and the
XDG-aware local bin directory before ${PATH} to ensure child scripts
(checks/*.sh) can find openshell when XDG_BIN_HOME is set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 2ebc836c-5999-4175-a7fe-bc430f4272d6

📥 Commits

Reviewing files that changed from the base of the PR and between 121148c and 825ed3b.

📒 Files selected for processing (4)
  • test/e2e/e2e-cloud-experimental/checks/04-landlock-readonly.sh
  • test/e2e/test-e2e-cloud-experimental.sh
  • test/e2e/test-full-e2e.sh
  • test/e2e/test-sandbox-survival.sh

Comment thread test/e2e/test-full-e2e.sh
…rash

CodeRabbit caught that TIMEOUT_CMD and sandbox_content/baseline_content/
post_content could be unbound under set -u (nounset) if the initial SSH
config call fails or all responses are empty.

Ref: #1969
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/test-sandbox-survival.sh (1)

327-345: Consider extracting the duplicated PONG retry block into a helper.

The two retry loops are nearly identical; pulling them into one function will reduce drift risk (for example, logging/detail differences) and simplify future tuning of attempts/backoff.

♻️ Optional refactor sketch
+retry_for_pong() {
+  local phase_label="$1"
+  local response="$2"
+  local content=""
+  local ok=false
+  for pong_attempt in 1 2 3; do
+    content=""
+    if [ -n "$response" ]; then
+      content=$(echo "$response" | parse_chat_content 2>/dev/null) || true
+    fi
+    if grep -qi "PONG" <<<"$content"; then
+      ok=true
+      break
+    fi
+    info "${phase_label} attempt ${pong_attempt}/3: got '${content:0:80}', retrying in 5s..."
+    [ "$pong_attempt" -lt 3 ] || break
+    sleep 5
+    response=$($TIMEOUT_CMD ssh "${SSH_OPTS[@]}" "$SSH_TARGET" "curl -s --max-time 60 https://inference.local/v1/chat/completions ..." 2>&1) || true
+  done
+  REPLY_OK="$ok"
+  REPLY_CONTENT="$content"
+  REPLY_RESPONSE="$response"
+}

Also applies to: 657-675

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-sandbox-survival.sh` around lines 327 - 345, Extract the
duplicated retry loop that checks for a "PONG" response into a single helper
shell function (e.g., wait_for_pong or retry_pong) that accepts/uses the same
variables referenced in the diff (SSH_TARGET, SSH_OPTS, TIMEOUT_CMD, MODEL,
parse_chat_content) and returns/sets the response or a success flag; move the
curl + parse_chat_content + grep + logging + backoff logic into that function
and replace both retry loops with calls to it to avoid duplication and keep
behavior (3 attempts, 5s sleep, logging of first 80 chars) identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 327-345: Extract the duplicated retry loop that checks for a
"PONG" response into a single helper shell function (e.g., wait_for_pong or
retry_pong) that accepts/uses the same variables referenced in the diff
(SSH_TARGET, SSH_OPTS, TIMEOUT_CMD, MODEL, parse_chat_content) and returns/sets
the response or a success flag; move the curl + parse_chat_content + grep +
logging + backoff logic into that function and replace both retry loops with
calls to it to avoid duplication and keep behavior (3 attempts, 5s sleep,
logging of first 80 chars) identical.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: afb8cdd1-1b9d-441c-9be6-ac7de22b13b0

📥 Commits

Reviewing files that changed from the base of the PR and between 0bb07f7 and 0908fab.

📒 Files selected for processing (2)
  • test/e2e/test-full-e2e.sh
  • test/e2e/test-sandbox-survival.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/test-full-e2e.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(ci): stabilize nightly E2E — PONG retry + landlock PATH

1 participant