Skip to content

fix(sensing-server): route WS source through effective_source() (closes #618)#619

Open
ArnonEnbar wants to merge 1 commit into
ruvnet:mainfrom
ArnonEnbar:fix/ws-source-stale-detection
Open

fix(sensing-server): route WS source through effective_source() (closes #618)#619
ArnonEnbar wants to merge 1 commit into
ruvnet:mainfrom
ArnonEnbar:fix/ws-source-stale-detection

Conversation

@ArnonEnbar
Copy link
Copy Markdown

Closes #618.

Summary

The two WebSocket broadcast sites in the sensing-server tick loop hardcoded source: "esp32".to_string(), bypassing the existing AppStateInner::effective_source() stale-detection logic. As a result, the UI continued to show "LIVE — ESP32 HARDWARE Connected" with frozen cached values for arbitrarily long after the ESP32 lost power or WiFi.

This PR replaces both literals with s.effective_source() so the WS path emits "esp32:offline" once ESP32_OFFLINE_TIMEOUT (5 s) elapses, matching the behaviour the REST /health endpoint already had.

Diff

-                        source: "esp32".to_string(),
+                        source: s.effective_source(),

Applied at the two existing call sites in v2/crates/wifi-densepose-sensing-server/src/main.rs. The s AppStateInner mutable guard is already held at both points (acquired upstream as let mut s = state.write().await;), so no new locks or restructure is needed.

Diagrams

Before — bug

Before

WS broadcast loop emits hardcoded source: "esp32" every 100 ms tick, regardless of UDP frame freshness. UI cannot distinguish live vs cached state.

After — fix

After

WS broadcast now routes through effective_source(), which returns "esp32:offline" when last_esp32_frame.elapsed() > ESP32_OFFLINE_TIMEOUT. UI can flip to a degraded-state indicator.

Source .drawio file is included for future editing: docs/diagrams/issue-618/source-stale-detection.drawio.

Why this matters

The project markets fall-detection, vital-sign monitoring, and overnight apnea screening use-cases. Silent stale-as-live is a safety-critical failure mode — a deployed system whose ESP32 lost power could continue reporting "present, breathing normal" indefinitely, masking a real emergency. The 5 s offline timeout was clearly designed to prevent exactly this, but only covered REST consumers.

Test plan

Build notes

Compiled on Windows with stable-x86_64-pc-windows-gnu (project default rust-toolchain.toml pins MSVC; switched locally via rustup override because MSVC build tools weren't installed). Fix is pure-Rust and target-agnostic, so this shouldn't matter for CI.

Wider scope (not addressed here)

This fix lets the UI know the data is stale; it does not prevent the WS broadcast loop from continuing to emit the cached frame each tick. A more thorough fix could:

  • Suppress WS broadcasts when effective_source().ends_with(":offline"), or
  • Add a stale: bool field on SensingUpdate (precedent: per-node NodeFeatureSnapshot.stale at main.rs:478).

Kept this PR minimal so it can land independently; happy to follow up.

ruvnet#618)

`AppStateInner::effective_source()` already implements 5-second stale
detection for the ESP32 UDP frame stream, returning `"esp32:offline"`
once `ESP32_OFFLINE_TIMEOUT` elapses. All REST endpoints (`/health`,
`/api/*`) already call it, so they correctly report the degraded state
when the ESP32 disconnects.

The two WS broadcast sites in the main tick loop were instead emitting
a hardcoded string literal `source: "esp32".to_string()`. The result:
WebSocket consumers (notably the bundled UI) kept seeing `source:
"esp32"` indefinitely after the node lost power, with the last received
frame frozen but re-broadcast every tick. The UI's "LIVE — ESP32
HARDWARE Connected" banner never flipped to an offline indicator.

This is a silent-failure pattern with safety implications for the
fall-detection / vital-monitoring use cases the project advertises.
A deployed system whose ESP32 lost power could continue reporting
"present, breathing normal" indefinitely.

This commit replaces both literals with `s.effective_source()` so the
WS path matches the REST path. The `s` AppStateInner guard is already
held at both call sites.

Two-page draw.io diagram of the broken/fixed flow is included under
`docs/diagrams/issue-618/` for the PR review.

No regression test is added — `effective_source()` is straightforward
and the fix is a literal-to-method-call swap. A unit test would either
require extracting a pure helper from `effective_source()` (refactor)
or constructing a full `AppStateInner` with all fields (no `Default`
impl exists). Both expand diff scope beyond what's needed for the fix.
Manual reproduction is captured in ruvnet#618.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WebSocket sensing broadcast hardcodes source="esp32", bypasses 5s stale-detection (effective_source not called)

1 participant