Skip to content

perf(desktop): cut entity discovery overhead via gating, dedup, blur pause, concurrency cap#577

Open
horacioh wants to merge 8 commits into
mainfrom
claude/document-research-data-flow-ELLad
Open

perf(desktop): cut entity discovery overhead via gating, dedup, blur pause, concurrency cap#577
horacioh wants to merge 8 commits into
mainfrom
claude/document-research-data-flow-ELLad

Conversation

@horacioh
Copy link
Copy Markdown
Collaborator

@horacioh horacioh commented May 8, 2026

Summary

Investigation of a recent perf regression traced sustained daemon load to entity discovery: every useResource with subscribed: true opens a long-lived discovery loop on the daemon, and many call sites were either over-subscribing or fragmenting the renderer-side dedup key. With many joined sites + followed accounts + open documents, this multiplied into hundreds of independent discoveryLoop() invocations.

This PR addresses the issue along five axes, one per commit, so each can be bisected or reverted independently:

# Commit What it does
1 perf(frontend): drop subscribed:true from low-value views Remove active discovery from breadcrumbs, sidebar sites/profiles listings, notification rows, and a couple of nav chrome callers. These views need cached metadata, not active polling.
2 perf(ui): gate embed subscriptions on viewport visibility New useInViewport hook (IntersectionObserver + 30 s grace) gates the three embed renderers so off-screen embeds in long documents don't each open a discovery loop.
3 perf(frontend): normalize mismatched subscription options doc-navigation was forking a separate dedup key from the resource page's high-priority sub for the same id; query-block toggled recursive on display mode, forking the same target between two keys. Both fixed.
4 perf(desktop): dedup entity subscriptions by id and fix stream cleanup The renderer ledger now keys per-entity (not per-{id, recursive, priority, scope}), merges callers' options up to the strongest, and owns one daemon sub + one discovery-state sub per entity. Fixes the latent bug where a second caller's discovery-state sub would be torn down by the first caller's cleanup.
5 perf(desktop): pause entity subs while the window is blurred New window-focus.ts listens to renderer focus/blur. After a 30 s grace, all of this window's daemon subs and discovery-state subs are torn down; on focus they're re-issued from the preserved caller list.
6 perf(desktop): cap concurrent discovery RPCs in the sync loop A small FIFO slot pool (MAX_CONCURRENT_DISCOVERY = 4) wraps runDiscovery so bursts of subs reaching their tick at once queue rather than pile in flight.

Plus a follow-up fix commit for downlevel TS iteration and a test commit covering the dedup, merge, cleanup, and blur-pause invariants.

Why each change is safe

  • Commit 1 / 3 — strict subset of previous behavior; the data is still queried via React Query, the entities the user is actively viewing are still subscribed by the resource page or by joined-site / followed-account derived subscriptions.
  • Commit 2 — embeds re-subscribe automatically when scrolled into view; grace period prevents thrash.
  • Commit 4 — daemon-side dedup already kept this from multiplying actual daemon work, so this is mostly fixing the renderer-side bookkeeping (and the cleanup bug). The behavior change: when callers disagree on recursive, the merged sub picks the strongest requested value (recursive: true if any caller wants it). If only non-recursive callers remain, the sub downgrades.
  • Commit 5 — preserves the caller list across pause, so re-focus restores subs without losing track of who needs them. The 30 s grace absorbs alt-tabs.
  • Commit 6 — backpressure only; never blocks indefinitely (slots free as RPCs finish). Cancellation re-checked before consuming a slot so unsubscribed loops don't still hit the wire.

Test plan

  • pnpm typecheck (all 9 workspaces clean)
  • pnpm test — 422 tests pass, including 8 new tests under frontend/apps/desktop/src/models/__tests__/entities.subscriptions.test.ts:
    • Two callers on the same entity → one daemon sub + one discovery-state sub
    • Adding a recursive caller upgrades the merged sub
    • Removing the recursive caller downgrades it back
    • Discovery-state sub stays alive while any caller remains (the cleanup-bug regression test)
    • Rapid unmount/remount does not churn the daemon
    • Separate entity ids tracked independently
    • Blur grace period tears down then refocus re-issues
    • Refocus during grace cancels the pending pause without churn
  • pnpm --filter @shm/ui test (32 pass)
  • pnpm --filter @shm/editor test (110 pass)
  • pnpm format:write (no changes)
  • Manual smoke before merging:
    • Open a heavy document with many embeds, scroll up/down, watch the sync footer panel — embeds should subscribe as they come into view and unsubscribe ~30 s after leaving.
    • Background and foreground a window — within ~30 s of blur, footer panel should empty; within ~1 s of focus, subs should reappear.
    • With several joined sites + followed accounts, observe daemon CPU / log output before vs. after this branch.

Notes for review

  • The BlockEmbed* exports now wrap an inner component in a single <div ref={ref}>. This adds one extra DOM element per embed but is necessary to host the IntersectionObserver target through all rendering branches. If layout shifts surface in design QA, we can switch to a cloneElement approach.
  • priority and scope are kept on EntitySubscription but are no longer part of the dedup key, since the daemon subscribe wire format never accepted them.
  • All commits are bisectable on their own — pnpm typecheck && pnpm test passes at every commit.

https://claude.ai/code/session_01RBiUf7sTa617hF4tybbswQ


Generated by Claude Code

claude added 8 commits May 8, 2026 08:36
Several view-only components requested an active discovery subscription
just to display cached metadata (breadcrumbs, sidebar listings,
notification rows, header chrome). Each of those calls opened a tRPC
stream and a discovery state subscription per row; in long lists or
deep breadcrumbs that multiplied daemon work for data the user does not
actually need to keep fresh.

Remove the subscribed flag from these sites; data still flows through
React Query, and the entities the user is actively viewing are still
covered by the main resource page sub or by joined-site /
followed-account derived subscriptions.
Embeds in long documents previously each opened an active discovery
subscription on mount, regardless of whether the embed was on screen.
Documents with many embeds (or recursive comment embeds) multiplied
that into many concurrent daemon discovery loops.

Add a useInViewport hook that uses IntersectionObserver with a 30 s
grace period to avoid churn on rapid scroll, and gate the subscribed
flag on each of the three embed renderers (BlockEmbedCard,
BlockEmbedContent, BlockEmbedComments) on it. The render path stays
identical; only the subscription kicks in when the embed comes into
(or near) the viewport.
The renderer-side dedup key in entities.ts is composed of
id + recursive + priority + scope, so callers that disagree on
those flags for the same entity create separate ref-counted entries
and separate tRPC streams. Two cases were forking unnecessarily:

- doc-navigation subscribed to the active document with
  {recursive: true} (key: id/*) while the resource page already
  subscribes the same id with {recursive: true, priority: 'high'}
  (key: id/*!high). Drop the redundant subscription; the navigation
  pane only needs cached outline metadata.

- query-block toggled `recursive` based on the display mode
  ('AllDescendants' vs 'Children'), forking the same query target
  between id and id/* keys. Always subscribe recursively so the
  display mode does not influence the dedup key.
Two related issues with the previous renderer-side subscription
ledger:

1) The dedup key combined id + recursive + priority + scope, but the
   daemon subscribe API only accepts {id, recursive} and ignores
   priority/scope. Callers that disagreed on priority or scope (e.g.
   the resource page asking for high priority and an embed asking
   for normal) created independent ref-counted entries even though
   they pointed at the same daemon work.

2) Each entry installed its own discovery-state tRPC sub on a shared
   per-entity stream and overwrote the prior unsubscribe handle.
   When the first caller cleaned up it would unsubscribe the wrong
   handle and delete the shared stream entry, leaking the live sub
   and breaking discovery state for the still-active caller.

Replace the multi-key ledger with a per-entity state that tracks the
set of active callers (keyed by sub object identity), merges their
options to the strongest requested values, and owns a single daemon
sub plus a single discovery-state sub. Re-issue the daemon sub only
when the merged recursive flag actually changes, so callers can
disagree on priority/scope without churn. Removal stays deferred via
queueMicrotask so React 18 batched unmount/remount does not flap the
daemon sub.
When a renderer window is not focused, its callers are by definition
not consuming fresh state — but the daemon kept polling discovery for
every subscribed entity at the same cadence (just with the global 10x
slowdown). With many windows or many subscriptions per window, that
turns into substantial idle work.

Track the focus state of the local renderer window via the standard
focus/blur events, and on a sustained blur (30 s grace) tear down all
of this window's outbound entity subs and discovery-state streams.
The state and caller list are preserved, so when the window regains
focus we re-issue the subs without losing any callers. The grace
period absorbs quick alt-tabs without churning the daemon.
Previously every subscription's discoveryLoop fired its discoverEntity
RPC independently, so when many subs reach a polling tick close
together (typical at app startup or after focus regain) the renderer
issues a burst of RPCs to the daemon. The daemon has its own worker
pool, but stacked tRPC + grpc-client overhead on the renderer side
adds up.

Add a small FIFO slot pool sized to MAX_CONCURRENT_DISCOVERY=4 around
runDiscovery so additional RPCs queue rather than pile in flight.
Re-check cancellation before consuming a slot so that a sub which
was unsubscribed while waiting does not still hit the wire.
Add targeted Vitest unit tests for the entity subscription ledger:

- two callers on the same entity with different priorities open one
  daemon sub and one discovery-state sub
- a recursive caller arriving later upgrades the merged daemon sub
- removing the recursive caller downgrades it back
- the discovery-state sub stays alive while any caller remains
- rapid unmount/remount of a single caller does not churn the daemon
- separate entity ids are tracked independently

Plus two tests for the blur-pause behaviour: subs are torn down once
the grace timer expires after a blur and re-issued on focus, and the
grace timer is cancelled when focus returns before it fires.
The desktop tsconfig has no explicit target so emits ES3 + lib ES2020,
which means MapIterator and Iterable<T> values cannot be iterated with
for-of without --downlevelIteration. Switch the new entity-ledger
iteration sites (cleanup, syncAllEntityStates, emitSubscriptionKeys)
to Map.forEach and pass the Map itself into mergeCallerOptions so
typecheck stays happy without enabling downlevelIteration globally.
@ericvicenti
Copy link
Copy Markdown
Collaborator

Please, please. Lets only merge this if we are sure that it solves performance issues.

Because I know this will introduce many bugs and situations where content will not load.

The breadcrumbs for example- I recently introduced syncing to those, because the parent docs were not loading. And now I see that work being un-done and we will have that bug again.

Capping the discovery count- this will also cause some content to not load for the user. We don't know if this will help performance, but we do know it will cause regressions.

So, if this solves performance issues for users with 30 subscriptions, great, lets merge it and we can solve those regressions. Otherwise, please no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants