fix(e2e): stabilize nightly — PONG retry + landlock PATH#1972
fix(e2e): stabilize nightly — PONG retry + landlock PATH#1972
Conversation
Two independent fixes for the nightly E2E which has been failing for 10+ consecutive runs: 1. Sandbox inference PONG assertions now retry 3 times with 5s backoff. Live models through the gateway proxy are not deterministic — a single attempt is too brittle for CI. Affects test-full-e2e.sh and test-sandbox-survival.sh (baseline + post-restart checks). 2. Landlock checks (04-landlock-readonly.sh) exited 127 because openshell was not on PATH in child bash processes spawned by Phase 5. Fix: export PATH including /usr/local/bin before running check scripts, and add || true to sandbox_exec calls so set -e does not abort on expected non-zero exits from Landlock-blocked operations. Fixes #1969
📝 WalkthroughWalkthroughTolerance for sandbox command failures was added, PATH is exported for child check scripts, and sandbox inference assertions now retry up to 3 times with 5s backoff to reduce flaky E2E failures. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Script
participant Shell as openshell / sandbox
participant Gateway as inference.local
Test->>Shell: request ssh-config / exec sandbox command
Shell-->>Test: ssh-config / command output (may fail)
loop up to 3 attempts
Test->>Gateway: request inference via sandbox proxy (curl)
Gateway-->>Test: response (may be non-deterministic)
alt response contains PONG (case-insensitive)
Test->>Test: mark success, stop retries
else
Test->>Test: log short preview, sleep 5s, retry
Test->>Shell: (recreate ssh config / re-init) before next fetch
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/e2e/test-e2e-cloud-experimental.sh (1)
518-521: HonorXDG_BIN_HOMEin the Phase 5 PATH fix.The installer can place
openshellin${XDG_BIN_HOME:-$HOME/.local/bin}. Hard-coding only$HOME/.local/binhere means childchecks/*.shcan still hit exit 127 whenXDG_BIN_HOMEis customized.♻️ Suggested tweak
- export PATH="/usr/local/bin:${HOME}/.local/bin:${PATH}" + export PATH="/usr/local/bin:${XDG_BIN_HOME:-${HOME}/.local/bin}:${PATH}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-e2e-cloud-experimental.sh` around lines 518 - 521, The PATH export currently hard-codes ${HOME}/.local/bin; update the Phase 5 PATH fix so it honors XDG_BIN_HOME by using the XDG_BIN_HOME fallback pattern (e.g., ${XDG_BIN_HOME:-${HOME}/.local/bin}) when constructing PATH; modify the export PATH line referenced in test/e2e/test-e2e-cloud-experimental.sh so it adds /usr/local/bin and the XDG-aware local bin directory before ${PATH} to ensure child scripts (checks/*.sh) can find openshell when XDG_BIN_HOME is set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-full-e2e.sh`:
- Around line 312-347: The loop can reference unset variables under set -u;
initialize TIMEOUT_CMD and sandbox_content before the retry loop to prevent
crashes. Specifically, before the for-loop that sets pong_ok, set TIMEOUT_CMD to
its intended default (the same value used elsewhere in the script) and set
sandbox_content="" so later references (lines using $TIMEOUT_CMD,
sandbox_content, and grep on sandbox_content) are safe; ensure the retry block
still overwrites sandbox_response/sandbox_content when a fetch succeeds and keep
the existing openshell sandbox ssh-config usage intact.
---
Nitpick comments:
In `@test/e2e/test-e2e-cloud-experimental.sh`:
- Around line 518-521: The PATH export currently hard-codes ${HOME}/.local/bin;
update the Phase 5 PATH fix so it honors XDG_BIN_HOME by using the XDG_BIN_HOME
fallback pattern (e.g., ${XDG_BIN_HOME:-${HOME}/.local/bin}) when constructing
PATH; modify the export PATH line referenced in
test/e2e/test-e2e-cloud-experimental.sh so it adds /usr/local/bin and the
XDG-aware local bin directory before ${PATH} to ensure child scripts
(checks/*.sh) can find openshell when XDG_BIN_HOME is set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 2ebc836c-5999-4175-a7fe-bc430f4272d6
📒 Files selected for processing (4)
test/e2e/e2e-cloud-experimental/checks/04-landlock-readonly.shtest/e2e/test-e2e-cloud-experimental.shtest/e2e/test-full-e2e.shtest/e2e/test-sandbox-survival.sh
…rash CodeRabbit caught that TIMEOUT_CMD and sandbox_content/baseline_content/ post_content could be unbound under set -u (nounset) if the initial SSH config call fails or all responses are empty. Ref: #1969
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/test-sandbox-survival.sh (1)
327-345: Consider extracting the duplicated PONG retry block into a helper.The two retry loops are nearly identical; pulling them into one function will reduce drift risk (for example, logging/detail differences) and simplify future tuning of attempts/backoff.
♻️ Optional refactor sketch
+retry_for_pong() { + local phase_label="$1" + local response="$2" + local content="" + local ok=false + for pong_attempt in 1 2 3; do + content="" + if [ -n "$response" ]; then + content=$(echo "$response" | parse_chat_content 2>/dev/null) || true + fi + if grep -qi "PONG" <<<"$content"; then + ok=true + break + fi + info "${phase_label} attempt ${pong_attempt}/3: got '${content:0:80}', retrying in 5s..." + [ "$pong_attempt" -lt 3 ] || break + sleep 5 + response=$($TIMEOUT_CMD ssh "${SSH_OPTS[@]}" "$SSH_TARGET" "curl -s --max-time 60 https://inference.local/v1/chat/completions ..." 2>&1) || true + done + REPLY_OK="$ok" + REPLY_CONTENT="$content" + REPLY_RESPONSE="$response" +}Also applies to: 657-675
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-survival.sh` around lines 327 - 345, Extract the duplicated retry loop that checks for a "PONG" response into a single helper shell function (e.g., wait_for_pong or retry_pong) that accepts/uses the same variables referenced in the diff (SSH_TARGET, SSH_OPTS, TIMEOUT_CMD, MODEL, parse_chat_content) and returns/sets the response or a success flag; move the curl + parse_chat_content + grep + logging + backoff logic into that function and replace both retry loops with calls to it to avoid duplication and keep behavior (3 attempts, 5s sleep, logging of first 80 chars) identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 327-345: Extract the duplicated retry loop that checks for a
"PONG" response into a single helper shell function (e.g., wait_for_pong or
retry_pong) that accepts/uses the same variables referenced in the diff
(SSH_TARGET, SSH_OPTS, TIMEOUT_CMD, MODEL, parse_chat_content) and returns/sets
the response or a success flag; move the curl + parse_chat_content + grep +
logging + backoff logic into that function and replace both retry loops with
calls to it to avoid duplication and keep behavior (3 attempts, 5s sleep,
logging of first 80 chars) identical.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: afb8cdd1-1b9d-441c-9be6-ac7de22b13b0
📒 Files selected for processing (2)
test/e2e/test-full-e2e.shtest/e2e/test-sandbox-survival.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e/test-full-e2e.sh
Summary
The nightly E2E has been failing for 10+ consecutive runs. Two independent fixes:
PONG inference retry — sandbox inference assertions now retry 3x with 5s backoff. Live models through the gateway proxy are not deterministic; a single attempt was too brittle for CI.
Landlock PATH fix —
04-landlock-readonly.shexited 127 becauseopenshellwas not on PATH in child bash processes. Exports PATH before Phase 5 check scripts and adds|| truetosandbox_execcalls for expected non-zero exits.Changes
test/e2e/test-full-e2e.sh— 3-attempt retry for sandbox inference PONGtest/e2e/test-sandbox-survival.sh— 3-attempt retry for baseline + post-restart PONGtest/e2e/test-e2e-cloud-experimental.sh— export PATH before Phase 5 checkstest/e2e/e2e-cloud-experimental/checks/04-landlock-readonly.sh—|| trueon sandbox_exec callsTesting
npm test— 1303 tests pass, no regressionsFixes #1969
Signed-off-by: Charan Jagwani cjagwani@nvidia.com
Summary by CodeRabbit