fix(nodes): retry daemon status during boot race to kill 10s flicker#100
Open
Nic-dorman wants to merge 1 commit into
Open
fix(nodes): retry daemon status during boot race to kill 10s flicker#100Nic-dorman wants to merge 1 commit into
Nic-dorman wants to merge 1 commit into
Conversation
Cold-launch flow before this fix:
t=0 app boot, nodesStore.init() runs
t=~0.5s ensure_daemon_running spawns daemon, polls for port file
t=~1s port file appears, ensure_daemon_running returns URL
(daemon's HTTP server may not have bound the port yet)
t=~1s first fetchDaemonStatus call → connection refused → throws
t=~1s outer catch: daemonConnected=false, initializing=false,
scheduleReconnect (10s timer)
t=1-11s UI shows "Cannot connect to node daemon" for ~10s
t=~11s reconnect timer fires, retry succeeds, UI flips to nodes
The port file is written before the HTTP server binds, so a single
status fetch immediately after `ensure_daemon_running` reliably races
the bind on cold boot.
New behavior: `fetchDaemonStatusWithRetry` retries up to ~5s (20 ×
250ms) inside init() with `initializing=true` throughout. The user
sees "Starting node daemon..." until the daemon actually serves or
the budget exhausts. The disconnected path and scheduleReconnect
loop are unchanged for genuine outages.
Test added: mocks a first-attempt failure + second-attempt success
under fake timers. Asserts no transient `daemonConnected=false`
escapes init() and the store ends up connected with two status calls.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The "Cannot connect to node daemon" message flashes for ~10 seconds on every cold launch. Root cause: a boot race between the daemon's port-file write and its HTTP-server bind.
Cold-launch timing today
Fix
New `fetchDaemonStatusWithRetry({ attempts, delayMs })` helper retries up to ~5 s (20 × 250 ms) inside `init()`. `initializing` stays true through the retries, so the user sees "Starting node daemon..." until the daemon actually serves or the budget exhausts.
The disconnected path and the existing `scheduleReconnect` 10 s loop are untouched — those still kick in for genuine outages.
Test plan
🤖 Generated with Claude Code