Skip to content

fix(scripts): download binary synchronously in stdio mode on first install#127

Merged
aeneasr merged 3 commits intoory:mainfrom
AutumnsGrove:fix/issue-125-first-install-mcp
Apr 14, 2026
Merged

fix(scripts): download binary synchronously in stdio mode on first install#127
aeneasr merged 3 commits intoory:mainfrom
AutumnsGrove:fix/issue-125-first-install-mcp

Conversation

@AutumnsGrove
Copy link
Copy Markdown
Contributor

Summary

  • Removes the stdio fast-fail block from scripts/run.sh and scripts/run.bat that exited 1 when the binary wasn't present
  • On first install, the MCP server now blocks ~5–10s to download the binary then starts normally — tools are available in the same session
  • Replaces stale stdio guard unit tests with an integration test that stubs curl in PATH and verifies run.sh stdio reaches the download path instead of fast-exiting

Root cause

The fast-fail assumed Claude Code would retry failed MCP servers after the SessionStart hook downloaded the binary. It doesn't — a failed MCP server is dead for the entire session. So on first marketplace install, lumen tools were silently unavailable.

Test plan

  • bash scripts/test_run.sh — all 30 tests pass (including new integration test)
  • Verified the new integration test fails on the unfixed script (TDD red/green)

Closes #125

🤖 Generated with Claude Code

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 12, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes first-install MCP server availability by preventing stdio mode from exiting early when the lumen binary is missing, allowing the launcher to wait for (and/or perform) the initial binary download so tools are available in the same session.

Changes:

  • Replace stdio “fast-fail” behavior with a 30s poll window that then falls back to downloading the binary if still missing (POSIX + Windows launchers).
  • Replace the previous stdio guard unit tests with integration-style coverage that stubs curl and exercises run.sh stdio first-run behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
scripts/run.sh Removes stdio early-exit and adds polling/fallback flow before downloading/executing the binary.
scripts/run.bat Windows equivalent: polls for hook-provided binary in stdio mode, then downloads if still missing.
scripts/test_run.sh Replaces stdio guard unit tests with integration tests for stdio first-run behavior (stubbed curl) and hook-delivery simulation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/run.sh Outdated
Comment on lines +32 to +45
# In stdio mode the MCP server and SessionStart hook start concurrently.
# Poll for the hook to finish downloading rather than doing it ourselves —
# this keeps the process alive so Claude Code doesn't time out waiting for
# the MCP initialise handshake.
if [ "${1:-}" = "stdio" ]; then
echo "Binary not yet downloaded; waiting for SessionStart hook to complete." >&2
exit 1
_waited=0
while [ $_waited -lt 30 ] && [ ! -x "$BINARY" ]; do
sleep 1
_waited=$((_waited + 1))
done
if [ -x "$BINARY" ]; then
exec "$BINARY" "$@"
fi
# Binary still missing after 30 s — fall through and download it now
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces up to a 30s polling delay before attempting a download in stdio mode. That seems to conflict with the PR description/title about downloading synchronously on first install with an expected ~5–10s block, and it can unnecessarily delay MCP startup in environments where the SessionStart hook doesn’t run (or runs later).

If the goal is to make first-run stdio reliably available, consider downloading immediately when the binary is missing, or making the poll window shorter/conditional (e.g., only when you can detect the hook will run).

Copilot uses AI. Check for mistakes.
Comment thread scripts/run.bat Outdated
Comment on lines +27 to +42
:: In stdio mode poll for the SessionStart hook to finish downloading
:: rather than blocking with a curl download ourselves.
if "%~1"=="stdio" (
echo Binary not yet downloaded; waiting for SessionStart hook to complete. >&2
exit /b 1
set "_waited=0"
:poll_loop
if exist "%BINARY%" goto poll_done
if !_waited! GEQ 30 goto poll_done
ping -n 2 127.0.0.1 >nul
set /a "_waited+=1"
goto poll_loop
:poll_done
if exist "%BINARY%" (
"%BINARY%" %*
exit /b %ERRORLEVEL%
)
:: Binary still missing after 30 s — fall through and download it now
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says to poll “rather than blocking with a curl download ourselves,” but the code falls through to a curl download after the poll window. Rewording this to reflect the actual behavior (poll first, then download as fallback) would make the intent clearer and avoid confusion during future maintenance.

Copilot uses AI. Check for mistakes.
Comment thread scripts/run.bat Outdated
Comment on lines +29 to +42
if "%~1"=="stdio" (
echo Binary not yet downloaded; waiting for SessionStart hook to complete. >&2
exit /b 1
set "_waited=0"
:poll_loop
if exist "%BINARY%" goto poll_done
if !_waited! GEQ 30 goto poll_done
ping -n 2 127.0.0.1 >nul
set /a "_waited+=1"
goto poll_loop
:poll_done
if exist "%BINARY%" (
"%BINARY%" %*
exit /b %ERRORLEVEL%
)
:: Binary still missing after 30 s — fall through and download it now
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the POSIX script: this introduces up to a 30s wait in stdio mode before downloading the binary. If the SessionStart hook isn’t running concurrently (or isn’t present on some hosts), this can add a large, avoidable startup delay and doesn’t match the PR’s stated first-install behavior.

Consider downloading immediately when missing, or making the poll window shorter/conditional.

Copilot uses AI. Check for mistakes.
Comment thread scripts/run.sh Outdated
Comment on lines +32 to +45
# In stdio mode the MCP server and SessionStart hook start concurrently.
# Poll for the hook to finish downloading rather than doing it ourselves —
# this keeps the process alive so Claude Code doesn't time out waiting for
# the MCP initialise handshake.
if [ "${1:-}" = "stdio" ]; then
echo "Binary not yet downloaded; waiting for SessionStart hook to complete." >&2
exit 1
_waited=0
while [ $_waited -lt 30 ] && [ ! -x "$BINARY" ]; do
sleep 1
_waited=$((_waited + 1))
done
if [ -x "$BINARY" ]; then
exec "$BINARY" "$@"
fi
# Binary still missing after 30 s — fall through and download it now
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says “Poll for the hook to finish downloading rather than doing it ourselves,” but after the 30s poll window the script explicitly falls through and downloads the binary itself. Consider adjusting the wording to reflect the poll-then-download fallback behavior.

Copilot uses AI. Check for mistakes.
Comment thread scripts/test_run.sh Outdated
Comment on lines +187 to +197
EXIT_CODE=0
CLAUDE_PLUGIN_ROOT="${_TMPROOT}" PATH="${_FAKE_CURL_DIR}:${PATH}" \
bash "${_SCRIPT_DIR}/run.sh" stdio >/dev/null 2>&1 || EXIT_CODE=$?

if [ "$EXIT_CODE" -eq 1 ]; then
echo " FAIL: run.sh stdio with missing binary should attempt download, not exit 1"
echo " expected: exit code != 1"
echo " got: exit code 1 (fast-fail still present)"
exit 1
fi
echo " PASS: run.sh stdio with missing binary attempts download instead of fast-failing"
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new integration test can report PASS even if run.sh stdio fails with a non-1 exit code (e.g., 2), because it only asserts EXIT_CODE != 1. That makes the test prone to false positives and could hide regressions unrelated to the old fast-fail.

Consider asserting a successful exit (or at least failing for any non-zero exit), and/or asserting an observable side effect like the stubbed curl being invoked / the expected binary being created.

Copilot uses AI. Check for mistakes.
@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented Apr 13, 2026

Could you please take a loot at the GitHub CoPilot comments? These scripts are atm very unreliable, what type of issue where you seeing?

@AutumnsGrove
Copy link
Copy Markdown
Contributor Author

Could you please take a loot at the GitHub CoPilot comments? These scripts are atm very unreliable, what type of issue where you seeing?

Absolutely.

The issue I have is attempting to install it via the suggested method for Claude code, the plugin marketplace. When I tried to do so, I received a failing error stating I was unauthorized to download it. So, I manually cloned it, built it, and installed it as an mcp server. This patch was in hopes of making it more reliable, but I can't fully test the plugin adding as it goes through the marketplace tied to your GitHub releases.

@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented Apr 14, 2026

I see, thank you - are you running on Windows?

@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented Apr 14, 2026

Also are you using powershell or cmd?

@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented Apr 14, 2026

The issue I have is attempting to install it via the suggested method for Claude code, the plugin marketplace. When I tried to do so, I received a failing error stating I was unauthorized to download it.

Another question - do you have a GH_TOKEN / GITHUB_TOKEN defined in your env? I'm trying to understand where the error is coming from.

Alternatively could you maybe look in your ~/.claude/logs/*.log for more details?

aeneasr added a commit that referenced this pull request Apr 14, 2026
Demonstrates the bug in #125: invoking `run.sh stdio` without a binary
(the Claude Code first-install flow) fast-exits instead of downloading.
Because Claude Code does not retry failed MCP servers, this leaves the
`lumen` tools unavailable for the entire session.

The existing `stdio early-exit guard tests` block never executes run.sh
— it reimplements the guard condition inline and asserts "the guard
fires for stdio" as correct behaviour, codifying the bug. This new
integration test stubs `curl` in PATH, creates a temporary plugin
root with a manifest but no binary, runs `bash run.sh stdio`, and
asserts exit 0 + that the expected binary file is produced.

Red now; turns green once the stdio fast-fail is removed (see #127).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aeneasr added a commit that referenced this pull request Apr 14, 2026
The existing stdio first-install integration test (added earlier in this
PR) stubbed curl to write `#!/bin/sh; exit 0` as the "downloaded binary"
and only checked the launcher exited 0 with a file on disk. That passes
on any launcher that reaches the download step — it can't distinguish a
launcher that actually execs a working MCP server from one that swallows
stdin/stdout, closes pipes early, or silently drops to the wrong binary
path. It also tested only run.sh; run.bat (with the same fast-fail bug)
was untested end-to-end.

Replaces the stub with a real mock MCP server (scripts/testdata/
mock_mcp_server/main.go — pure Go, no CGO, cross-compiles on all three
OSes). The test now:

  - Builds the mock for the current OS/arch
  - Creates a clean plugin root with a manifest but no bin/
  - Stubs curl (POSIX) / curl.bat (Windows) to copy the mock into the
    path the launcher will exec
  - Pipes a real JSON-RPC `initialize` request on stdin
  - Asserts the stdout response contains `jsonrpc: "2.0"` AND
    `serverInfo.name == "mock-lumen"`

A passing test transitively proves the launcher did not fast-exit in
stdio mode, reached the download code path, wrote the artefact where it
would exec it, set it executable, and exec'd it with stdin/stdout
inherited correctly — the actual MCP-server-alive contract.

Adds scripts/test_run_windows.ps1 (same shape, against run.bat) and
wires it into the CI `scripts` matrix as a new pwsh step. Also expands
the matrix to ubuntu-latest + macos-latest + windows-latest and adds
actions/setup-go so the mock can be built.

Both the POSIX test and the Windows test are RED against the current
launchers because run.sh AND run.bat both still contain the stdio
fast-fail guard — merging #127 plus an equivalent run.bat change will
flip them green.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AutumnsGrove and others added 3 commits April 14, 2026 13:36
…stall

Remove the fast-fail block that exited 1 when run.sh/run.bat was called
with 'stdio' and no binary was present. The original design assumed Claude
Code would retry failed MCP servers after the SessionStart hook downloaded
the binary — it does not. This caused MCP tools to be unavailable for the
entire first session after a marketplace install.

With the fast-fail removed, stdio mode now falls through to the existing
download logic. The MCP server blocks for ~5-10s on first use while the
binary downloads, then starts normally. Subsequent sessions are unaffected
since the binary is already present.

Replace the stdio guard unit tests with an integration test that stubs curl
in PATH and verifies run.sh stdio exits with a non-1 code (i.e. reaches the
download path) when no binary is present.

Closes ory#125

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rectly

The previous fix (remove fast-fail, download synchronously) was insufficient:
Claude Code times out waiting for the MCP initialise handshake while curl
blocks the process for 5-10 s.

Switch to Option C: in stdio mode, poll for the binary to appear (1 s
intervals, 30 s ceiling) so the SessionStart hook — which starts concurrently
and already owns the download — can deliver it. The process stays alive
throughout, avoiding the MCP startup timeout. If the binary is still missing
after 30 s (e.g. the hook failed) we fall through and download it ourselves
as a last resort.

Add a second integration test that simulates this concurrency: a background
subshell writes the binary after 2 s while run.sh polls, then verifies run.sh
execs it and exits cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop the 30-second poll window introduced in the previous commit.
Polling is fragile: MCP server and SessionStart hook race at startup
with no guaranteed ordering, and on hosts without the hook the poll
just adds avoidable delay before falling through to download anyway.

The correct fix is synchronous download: when the binary is missing in
stdio mode, download it inline (~5-10 s on first install) then exec.
The process stays alive throughout so Claude Code's MCP handshake
eventually completes; subsequent sessions are instant since the binary
is already present.

Also tighten the integration test per review feedback: assert exit 0
(not just "not 1") and verify the expected binary path was created by
the stubbed curl.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aeneasr aeneasr force-pushed the fix/issue-125-first-install-mcp branch from e4047e4 to 69ade3c Compare April 14, 2026 11:36
@aeneasr aeneasr enabled auto-merge (squash) April 14, 2026 11:37
@aeneasr aeneasr requested a review from Copilot April 14, 2026 11:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/run.bat
Comment on lines 24 to 26

:: Download on first run if binary is missing
if not exist "%BINARY%" (
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the stdio fast-fail removed, the Windows launcher will now run the download path during MCP stdio startup as well. The current curl settings (--max-time 300 --retry 3 --retry-delay 2) can make the MCP server startup block for a long time on first install (minutes in worst cases), which may exceed host startup timeouts. Consider a smaller timeout/retry policy for stdio mode, and/or make the timeout configurable so users can tune it in constrained environments.

Suggested change
:: Download on first run if binary is missing
if not exist "%BINARY%" (
:: Detect stdio startup so we can avoid blocking host startup on first-run download
set "IS_STDIO="
if /I "%~1"=="stdio" set "IS_STDIO=1"
:: Download on first run if binary is missing
if not exist "%BINARY%" (
if defined IS_STDIO (
echo Error: %BINARY% is missing and cannot be downloaded during stdio startup. Please install it before launching MCP stdio. >&2
exit /b 1
)

Copilot uses AI. Check for mistakes.
Comment thread scripts/run.sh
Comment on lines 29 to 31
if [ -z "$BINARY" ]; then
BINARY="${PLUGIN_ROOT}/bin/lumen-${OS}-${ARCH}"

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the stdio fast-fail is removed, run.sh stdio can block for a long time on first install when the network is slow/unavailable: the download uses curl --max-time 300 --retry 3 --retry-delay 2, which can stall the MCP server startup far beyond the ~5–10s described in the PR (potentially many minutes). Consider reducing the timeout/retry policy specifically for stdio mode, adding a separate total timeout cap, or making these values configurable via env vars so hosts with startup timeouts don’t mark the server unhealthy.

Copilot uses AI. Check for mistakes.
@aeneasr aeneasr merged commit 9cbff58 into ory:main Apr 14, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plugin MCP server fails on first install — stdio exits before SessionStart hook downloads binary

4 participants