Skip to content

fix: sink partial-failure handling, determinism, and hot-path allocations#147

Merged
joshua-temple merged 4 commits into
mainfrom
fix/sink-correctness
Jun 5, 2026
Merged

fix: sink partial-failure handling, determinism, and hot-path allocations#147
joshua-temple merged 4 commits into
mainfrom
fix/sink-correctness

Conversation

@joshua-temple

Copy link
Copy Markdown
Collaborator

What this change does

Remediates the sink review findings (correctness + perf + coverage; no feature work):

  • Partial-failure correctness (the headline theme): kinesis.PutRecords now inspects FailedRecordCount and returns a sentinel ErrPartialFailure; dynamo adds BatchWriteChecked (sentinel ErrUnprocessedItems); eventbridge counts from the authoritative FailedEntryCount; sns.PublishBatch checks out.Failed; the core reservoir non-batch path errors.Joins all per-payload errors instead of keeping only the last.
  • Determinism / context: prometheus sorts label keys before serializing; statsd threads the caller context into the transform instead of context.Background().
  • Perf: Manifold outlets are copy-on-write so emit no longer copies the slice; http uses bytes.NewReader; cloudwatch gains a clock-injectable variant.
  • Inert surface: removed the unused WithPollerClock option (wiring needed new machinery); PhaseTransform kept but documented as reserved (removing it would be breaking) and the post-Shutdown reservoir path is guarded.
  • Tests: added coverage for every fix plus the cited gaps (sqs plural, eventbridge nil-code skip, http error paths, sinktest branches, manifold concurrency under -race).

Checklist

  • Signed off (DCO), conventional commits, no feature work
  • Every sink module + examples/sinkflow + e2e build/test/vet clean; core+statsd -race clean

Kinesis PutRecords, SNS PublishBatch, and DynamoDB BatchWrite all return
HTTP 200 while individual records are rejected or left unprocessed, so a
success was reported even when work was dropped. Inspect the response:
PutRecords returns ErrPartialFailure on a non-zero FailedRecordCount, a new
BatchWriteChecked returns ErrUnprocessedItems on UnprocessedItems, and
PublishBatch reports failed entries. EventBridge now reports the authoritative
FailedEntryCount so the count and its plural stay correct.

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
…batch errors

Prometheus serialized labels in map-iteration order, breaking the documented
stable output and golden assertions; sort the keys. The statsd Aggregator
dropped the caller context before the registry transform, losing deadline and
trace propagation; thread it through metricFor. The reservoir non-batch
dispatch kept only the last error; join them so no per-payload failure is lost.

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
Manifold.Attach is now copy-on-write so Sink reads the outlet slice without
copying it on every emit. The http Post builds its request body with
bytes.NewReader instead of copying through a string. CloudWatch gains a
clock-injectable PutLogEventAt so the timestamp is testable without the wall
clock. The unread WithPollerClock option and its stored-but-unused clock are
removed, and the PhaseTransform godoc now states it is reserved rather than
emitted by the built-in Emitter.

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
Add the http build-request and PostJSON marshal-error paths, and exercise the
sinktest conformance harness's Flusher and Shutdowner error and panic branches
plus its nil-Outlet guard.

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
@joshua-temple joshua-temple merged commit a064e1b into main Jun 5, 2026
121 checks passed
@joshua-temple joshua-temple deleted the fix/sink-correctness branch June 5, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant