Skip to content

LIFT-2452: Restore async FirehoseClient to fix player disconnections#216

Merged
mason-chester-rp merged 1 commit into
release-0.3.6.xfrom
LIFT-2452/async-firehose-client
Apr 27, 2026
Merged

LIFT-2452: Restore async FirehoseClient to fix player disconnections#216
mason-chester-rp merged 1 commit into
release-0.3.6.xfrom
LIFT-2452/async-firehose-client

Conversation

@mason-chester-rp

Copy link
Copy Markdown

Summary

  • Restores fire-and-forget async Firehose behaviour lost during the LIFT-1806 AWS SDK v2 migration
  • Swaps FirehoseClient (sync) for FirehoseAsyncClient (async) in FirehoseDbputRecordBatch() on the async client returns a CompletableFuture that is intentionally not awaited, so Tomcat threads are never blocked on the Firehose HTTP round-trip
  • Bumps version to 0.3.6.18

Root Cause

The original SDK v1 code explicitly used AmazonKinesisFirehoseAsync and putRecordBatchAsync(). The LIFT-1806 migration switched to the synchronous FirehoseClient, silently blocking every event POST on the Firehose round-trip. Players have a ~10s read timeout, causing widespread disconnections.

Prod impact confirmed (last 24h, ecs/prod/player/snooze/event-api):

  • ~1,886 request very slow: FirehosePostHandler_POST_events warnings
  • ~3,871 ClientAbortException: SocketTimeoutException

Test plan

  • Build passes (./gradlew test)
  • Deploy to dev/e2e and confirm request very slow warnings and SocketTimeoutException errors drop to zero on the event API
  • Confirm Firehose stream delivery is unaffected (records still arriving in S3/Redshift)

@mason-chester-rp mason-chester-rp merged commit 3d57a83 into release-0.3.6.x Apr 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants