LIFT-2453: Restore async FirehoseClient to fix player disconnections (0.3.x)#217
Conversation
… and player disconnections
|
|
||
| if ((i + 1) % batchMax == 0) | ||
| { | ||
| firehose.putRecordBatch(PutRecordBatchRequest.builder().deliveryStreamName(streamName).records(batch).build()); |
There was a problem hiding this comment.
putRecordBatch() on the async client returns a CompletableFuture that is currently discarded without an error handler. The existing catch block won't catch async failures, so if Firehose rejects a batch the error will be silently swallowed — no log, no exception, and the CloudWatch failure metrics won't fire either.
Would you consider attaching an .exceptionally() handler to the returned future to at least log failures? Something like:
firehose.putRecordBatch(request)
.exceptionally(ex -> {
log.error("Error putting records to firehose stream '{}': {}", streamName, ex.getMessage(), ex);
return null;
});
There was a problem hiding this comment.
Looking into this some more, before upgrading to the latest sdk, we didn't log errors or fire exceptions in this way either. I'm honestly not sure how this code block was ever not getting a 200/201: https://github.com/RocketPartners/liftck_snooze/blob/987bdac2d66a6aa9244982cd374865c7aa535b70/src/main/java/com/liftck/snooze/api/PlayerEventHandler.java#L103-L111
connor-brown-rp
left a comment
There was a problem hiding this comment.
++ to Joel's comment above
(although, the old sdk v1 code also seemed to discard any async errors, so if we don't add the error logging, then at least its not a regression of functionality)
Summary
release-0.3.x, lost during the LIFT-1806 AWS SDK v2 migrationFirehoseClient(sync) forFirehoseAsyncClient(async) inFirehoseDb—putRecordBatch()on the async client returns aCompletableFuturethat is intentionally not awaited at request timelistDeliveryStreams()inbootstrapApi()calls.join()since it is a one-time startup operation where blocking is acceptable0.3.27Related
release-0.3.6.xliftck_heartbeatdepends onrocket-inversion:0.3.26— once this is merged and published, heartbeat should be updated to0.3.27Test plan
./gradlew test)liftck_heartbeatupdated to0.3.27and deployed to dev/e2erequest very slowwarnings andSocketTimeoutExceptionerrors drop to zero