Skip to content

LIFT-2453: Restore async FirehoseClient to fix player disconnections (0.3.x)#217

Merged
mason-chester-rp merged 1 commit into
release-0.3.xfrom
LIFT-2453/async-firehose-client
Apr 27, 2026
Merged

LIFT-2453: Restore async FirehoseClient to fix player disconnections (0.3.x)#217
mason-chester-rp merged 1 commit into
release-0.3.xfrom
LIFT-2453/async-firehose-client

Conversation

@mason-chester-rp

Copy link
Copy Markdown

Summary

  • Restores fire-and-forget async Firehose behaviour on release-0.3.x, lost during the LIFT-1806 AWS SDK v2 migration
  • Swaps FirehoseClient (sync) for FirehoseAsyncClient (async) in FirehoseDbputRecordBatch() on the async client returns a CompletableFuture that is intentionally not awaited at request time
  • listDeliveryStreams() in bootstrapApi() calls .join() since it is a one-time startup operation where blocking is acceptable
  • Bumps version to 0.3.27

Related

Test plan

  • Build passes (./gradlew test)
  • liftck_heartbeat updated to 0.3.27 and deployed to dev/e2e
  • Confirm request very slow warnings and SocketTimeoutException errors drop to zero
  • Confirm Firehose stream delivery is unaffected


if ((i + 1) % batchMax == 0)
{
firehose.putRecordBatch(PutRecordBatchRequest.builder().deliveryStreamName(streamName).records(batch).build());

@joel-katz-rp joel-katz-rp Apr 27, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putRecordBatch() on the async client returns a CompletableFuture that is currently discarded without an error handler. The existing catch block won't catch async failures, so if Firehose rejects a batch the error will be silently swallowed — no log, no exception, and the CloudWatch failure metrics won't fire either.

Would you consider attaching an .exceptionally() handler to the returned future to at least log failures? Something like:

  firehose.putRecordBatch(request)
      .exceptionally(ex -> {
          log.error("Error putting records to firehose stream '{}': {}", streamName, ex.getMessage(), ex);
          return null;
      }); 

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this some more, before upgrading to the latest sdk, we didn't log errors or fire exceptions in this way either. I'm honestly not sure how this code block was ever not getting a 200/201: https://github.com/RocketPartners/liftck_snooze/blob/987bdac2d66a6aa9244982cd374865c7aa535b70/src/main/java/com/liftck/snooze/api/PlayerEventHandler.java#L103-L111

@connor-brown-rp connor-brown-rp left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ to Joel's comment above
(although, the old sdk v1 code also seemed to discard any async errors, so if we don't add the error logging, then at least its not a regression of functionality)

@mason-chester-rp mason-chester-rp merged commit f610dd9 into release-0.3.x Apr 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants