fix(sensing-server): route WS source through effective_source() (closes #618)#619
Open
ArnonEnbar wants to merge 1 commit into
Open
fix(sensing-server): route WS source through effective_source() (closes #618)#619ArnonEnbar wants to merge 1 commit into
ArnonEnbar wants to merge 1 commit into
Conversation
ruvnet#618) `AppStateInner::effective_source()` already implements 5-second stale detection for the ESP32 UDP frame stream, returning `"esp32:offline"` once `ESP32_OFFLINE_TIMEOUT` elapses. All REST endpoints (`/health`, `/api/*`) already call it, so they correctly report the degraded state when the ESP32 disconnects. The two WS broadcast sites in the main tick loop were instead emitting a hardcoded string literal `source: "esp32".to_string()`. The result: WebSocket consumers (notably the bundled UI) kept seeing `source: "esp32"` indefinitely after the node lost power, with the last received frame frozen but re-broadcast every tick. The UI's "LIVE — ESP32 HARDWARE Connected" banner never flipped to an offline indicator. This is a silent-failure pattern with safety implications for the fall-detection / vital-monitoring use cases the project advertises. A deployed system whose ESP32 lost power could continue reporting "present, breathing normal" indefinitely. This commit replaces both literals with `s.effective_source()` so the WS path matches the REST path. The `s` AppStateInner guard is already held at both call sites. Two-page draw.io diagram of the broken/fixed flow is included under `docs/diagrams/issue-618/` for the PR review. No regression test is added — `effective_source()` is straightforward and the fix is a literal-to-method-call swap. A unit test would either require extracting a pure helper from `effective_source()` (refactor) or constructing a full `AppStateInner` with all fields (no `Default` impl exists). Both expand diff scope beyond what's needed for the fix. Manual reproduction is captured in ruvnet#618.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #618.
Summary
The two WebSocket broadcast sites in the sensing-server tick loop hardcoded
source: "esp32".to_string(), bypassing the existingAppStateInner::effective_source()stale-detection logic. As a result, the UI continued to show "LIVE — ESP32 HARDWARE Connected" with frozen cached values for arbitrarily long after the ESP32 lost power or WiFi.This PR replaces both literals with
s.effective_source()so the WS path emits"esp32:offline"onceESP32_OFFLINE_TIMEOUT(5 s) elapses, matching the behaviour the REST/healthendpoint already had.Diff
Applied at the two existing call sites in
v2/crates/wifi-densepose-sensing-server/src/main.rs. ThesAppStateInner mutable guard is already held at both points (acquired upstream aslet mut s = state.write().await;), so no new locks or restructure is needed.Diagrams
Before — bug
WS broadcast loop emits hardcoded
source: "esp32"every 100 ms tick, regardless of UDP frame freshness. UI cannot distinguish live vs cached state.After — fix
WS broadcast now routes through
effective_source(), which returns"esp32:offline"whenlast_esp32_frame.elapsed() > ESP32_OFFLINE_TIMEOUT. UI can flip to a degraded-state indicator.Source
.drawiofile is included for future editing:docs/diagrams/issue-618/source-stale-detection.drawio.Why this matters
The project markets fall-detection, vital-sign monitoring, and overnight apnea screening use-cases. Silent stale-as-live is a safety-critical failure mode — a deployed system whose ESP32 lost power could continue reporting "present, breathing normal" indefinitely, masking a real emergency. The 5 s offline timeout was clearly designed to prevent exactly this, but only covered REST consumers.
Test plan
Build with
cargo build -p wifi-densepose-sensing-server --no-default-features(clean, 1 pre-existing unused-imports warning)Manual repro per WebSocket sensing broadcast hardcodes source="esp32", bypasses 5s stale-detection (effective_source not called) #618: ESP32 unplugged → after 5 s,
/healthreportsesp32:offlineAND now WS/ws/sensingpayloadsourcealso flips toesp32:offline.No unit/integration test added.
effective_source()is a singleifchain, the diff is literal-to-method swap. Adding a test would require either:effective_source()(refactor outside this PR's scope), orAppStateInnerwith all ~80 fields (noDefaultimpl).Happy to add either approach if a reviewer prefers — let me know which direction. Otherwise, the manual reproduction in WebSocket sensing broadcast hardcodes source="esp32", bypasses 5s stale-detection (effective_source not called) #618 is the verification.
Build notes
Compiled on Windows with
stable-x86_64-pc-windows-gnu(project defaultrust-toolchain.tomlpins MSVC; switched locally viarustup overridebecause MSVC build tools weren't installed). Fix is pure-Rust and target-agnostic, so this shouldn't matter for CI.Wider scope (not addressed here)
This fix lets the UI know the data is stale; it does not prevent the WS broadcast loop from continuing to emit the cached frame each tick. A more thorough fix could:
effective_source().ends_with(":offline"), orstale: boolfield onSensingUpdate(precedent: per-nodeNodeFeatureSnapshot.staleat main.rs:478).Kept this PR minimal so it can land independently; happy to follow up.