Skip to content

Comments

y-partyserver: Hibernation support & awareness fix#341

Open
threepointone wants to merge 5 commits intomainfrom
y-partyserver-fixes
Open

y-partyserver: Hibernation support & awareness fix#341
threepointone wants to merge 5 commits intomainfrom
y-partyserver-fixes

Conversation

@threepointone
Copy link
Collaborator

Summary

Adds full Durable Object hibernation support to y-partyserver. Previously, the server tracked connections in an in-memory Map (WSSharedDoc.conns) which was lost on hibernation, breaking Yjs sync and awareness propagation after wake-up. This PR replaces that with hibernation-compatible primitives and suppresses awareness heartbeats that were preventing DOs from hibernating in the first place.

Highlight: Hibernation Support

With hibernate: true, Durable Objects can now actually hibernate during idle sessions. Previously, the awareness protocol's built-in 15-second heartbeat kept generating WebSocket traffic, preventing hibernation indefinitely. This PR suppresses those heartbeats on both client and server, and ensures the server recovers its document state from connected clients after waking up.

Server changes (packages/y-partyserver/src/server/index.ts)

Connection tracking survives hibernation

  • Removed WSSharedDoc.conns (Map<Connection, Set<number>>) — lost on hibernation
  • Added connection.setState() with AWARENESS_IDS_KEY to persist awareness client IDs per connection via WebSocket attachments (survives hibernation)
  • Added getAwarenessIds() / setAwarenessIds() helpers with defensive try/catch

Event handlers moved to onStart()

  • Doc update broadcast and awareness tracking handlers moved from WSSharedDoc constructor into onStart(), where they use this.getConnections() instead of the removed conns Map

Hibernation wake-up re-sync

  • After onStart() registers handlers, it sends sync step 1 to all existing connections
  • Surviving clients respond with sync step 2, repopulating the server's empty document
  • On first start (no connections), this is a no-op

Timer cleanup

  • awareness._checkInterval cleared in WSSharedDoc constructor to prevent the protocol's internal timer from defeating hibernation

Cleanup

  • Removed closeConn() helper — awareness cleanup now happens in onClose via persisted connection state
  • Simplified send() — silently returns on broken connections instead of forcibly closing them
  • broadcastCustomMessage uses for...of over getConnections() instead of conns.forEach
  • Widened onLoad() return type to Promise<YDoc | void>

Provider changes (packages/y-partyserver/src/provider/index.ts)

Awareness heartbeat suppression

  • Switched from awareness.on("update") to awareness.on("change") — clock-only renewals (the 15s heartbeat) no longer produce network traffic
  • Disabled awareness._checkInterval on the client — stops clock renewals and peer timeout removal
  • Removed the provider's own _checkInterval liveness timer (was coupled to the heartbeat)

Reconnection fixes

  • Clear stale awareness meta for remote clients on WebSocket close — ensures reconnecting clients' awareness updates are accepted (clock check starts fresh)
  • Bump awareness clock via setLocalState(getLocalState()) on reconnect — belt-and-suspenders with the meta cleanup

Bug fix

  • host.slice(0, -1) result was not assigned back to host, so trailing slashes were never actually stripped

Test coverage

23 unit tests — including:

  • Clock-only awareness renewals are not broadcast (proving suppression is client-side)
  • Server does not auto-remove awareness after timeout (proving _checkInterval is disabled)

16 integration tests — all existing tests pass unchanged

14 hibernation tests (3 new) — using server.restart() pattern:

  • onStartCount increments on each restart (storage-backed tracker)
  • Multiple clients re-sync and stay converged after restart
  • New client gets re-synced state after restart

New test infrastructure

  • YHibernateTracker DO — tracks onStart call count in storage, exposes it via HTTP
  • Added to integration-wrangler.jsonc bindings and migrations

Add a comprehensive test suite and related configs for y-partyserver (unit + integration), including global test setup that spawns wrangler dev, Vitest configs, a wrangler integration config, TS test tsconfig, and many test files (index.test.ts, integration.test.ts, worker.ts, global-setup.ts, etc.). Update package.json metadata and scripts (add test, test:integration, check:test), adjust files/peerDependencies/devDependencies ordering. Fix a bug in YProvider where a trailing slash on the host was not being stripped (assign sliced value back to host). Narrow YServer.onLoad return type to Promise<YDoc | void> to allow returning a Y.Doc from onLoad. Overall this change wires up end-to-end and unit testing infrastructure and includes small API/bug fixes to support tests.
Refactor Y server to support Cloudflare Workers hibernation: remove in-memory conn map and persist per-connection awareness client IDs via connection.state so tracking survives hibernation. Switch broadcasting to use getConnections(), add durable handling for sync and awareness messages, clear stale awareness meta on disconnect, and bump awareness clock on reconnect so remote clients accept re-propagated states. Add debounced persistence and robust connection handling, plus extensive hibernation/integration tests, a Wrangler test harness, and vitest/tsconfig additions. Also update package manifests/workspaces and bump partysocket/partyserver dependency pins.
Prevent periodic awareness heartbeats from generating network traffic and enable client-driven re-sync after DO hibernation. Changes include:

- Provider: stop sending clock-only awareness renewals by listening to awareness 'change' (not 'update') and clear the awareness _checkInterval instead of using a built-in check loop; removed related reconnect timeout and local interval handling.
- Server: clear WSSharedDoc awareness _checkInterval and, on YServer start, send a sync step 1 to all surviving connections so clients re-sync their full state after hibernation wake-up.
- Fixtures/worker: add YHibernateTracker Durable Object (and namespace) to track onStart counts via storage and expose onRequest for tests; expose MonacoServer subclass enabling hibernate.
- Tests: add hibernation and awareness tests to verify onStart re-sync behavior, multi-client convergence after restart, new-client re-sync, and suppression of clock-only heartbeats / absence of auto-removal.
- Integration config: register YHibernateTracker for SQLite migration.

These changes are intended to allow Durable Object hibernation (by avoiding periodic awareness traffic) while ensuring state is re-synchronized from surviving connections after a wake-up.
Add a changeset and update server startup to address Yjs Durable Object hibernation and awareness issues.

Key changes:
- Add .changeset/hibernation-awareness-fix.md describing fixes for server and provider behavior around hibernation and awareness propagation.
- Server: persist connection state instead of in-memory Map, move event handler registration to onStart, disable awareness built-in _checkInterval, resend sync step 1 after wake, simplify send/error handling, and allow onLoad to return a seeded YDoc.
- Provider: switch awareness listener from "update" to "change", disable client _checkInterval and provider liveness timer, clear stale awareness meta on WS close, bump awareness clock on reconnect, and fix trailing-slash stripping bug.
- Fixture: await super.onStart() in MonacoServer.onStart to ensure proper async startup.

These changes enable connections and awareness to survive Durable Object hibernation and reduce unnecessary heartbeat traffic.
@changeset-bot
Copy link

changeset-bot bot commented Feb 24, 2026

🦋 Changeset detected

Latest commit: f9fead4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
y-partyserver Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 24, 2026

Open in StackBlitz

hono-party

npm i https://pkg.pr.new/cloudflare/partykit/hono-party@341

partyfn

npm i https://pkg.pr.new/cloudflare/partykit/partyfn@341

partyserver

npm i https://pkg.pr.new/cloudflare/partykit/partyserver@341

partysocket

npm i https://pkg.pr.new/cloudflare/partykit/partysocket@341

partysub

npm i https://pkg.pr.new/cloudflare/partykit/partysub@341

partysync

npm i https://pkg.pr.new/cloudflare/partykit/partysync@341

partytracks

npm i https://pkg.pr.new/cloudflare/partykit/partytracks@341

partywhen

npm i https://pkg.pr.new/cloudflare/partykit/partywhen@341

y-partyserver

npm i https://pkg.pr.new/cloudflare/partykit/y-partyserver@341

commit: f9fead4

Copy link

@MrgSub MrgSub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@threepointone
Copy link
Collaborator Author

lol thanks @MrgSub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants