[kernel-1116] browser events add s2 storage support#235
Conversation
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR title and branch name suggest feature work on browser events/s2 storage, but no file changes are visible to confirm modifications to API endpoints or Temporal workflows. To monitor this PR anyway, reply with |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
rgarcia
left a comment
There was a problem hiding this comment.
need to resolve #227 first and get a global event stream in place/moving ring/seq off capturesession
and then i think this pr is more reviewable
also in a similar vein instead of one s2 writer per capture session i think there's one global s2 writer that is reading from the global event stream
| @@ -1,5 +1,5 @@ | |||
| [program:kernel-images-api] | |||
| command=/bin/bash -lc 'mkdir -p "${KERNEL_IMAGES_API_OUTPUT_DIR:-/recordings}" && PORT="${KERNEL_IMAGES_API_PORT:-10001}" FRAME_RATE="${KERNEL_IMAGES_API_FRAME_RATE:-10}" DISPLAY_NUM="${KERNEL_IMAGES_API_DISPLAY_NUM:-${DISPLAY_NUM:-1}}" MAX_SIZE_MB="${KERNEL_IMAGES_API_MAX_SIZE_MB:-500}" OUTPUT_DIR="${KERNEL_IMAGES_API_OUTPUT_DIR:-/recordings}" LOG_CDP_MESSAGES="${LOG_CDP_MESSAGES:-false}" exec /usr/local/bin/kernel-images-api' | |||
| command=/bin/bash -lc 'mkdir -p "${KERNEL_IMAGES_API_OUTPUT_DIR:-/recordings}" && PORT="${KERNEL_IMAGES_API_PORT:-10001}" FRAME_RATE="${KERNEL_IMAGES_API_FRAME_RATE:-10}" DISPLAY_NUM="${KERNEL_IMAGES_API_DISPLAY_NUM:-${DISPLAY_NUM:-1}}" MAX_SIZE_MB="${KERNEL_IMAGES_API_MAX_SIZE_MB:-500}" OUTPUT_DIR="${KERNEL_IMAGES_API_OUTPUT_DIR:-/recordings}" LOG_CDP_MESSAGES="${LOG_CDP_MESSAGES:-false}" S2_BASIN="${S2_BASIN:-}" S2_ACCESS_TOKEN="${S2_ACCESS_TOKEN:-}" exec /usr/local/bin/kernel-images-api' | |||
There was a problem hiding this comment.
should there also be an s2 stream config?
There was a problem hiding this comment.
stream names are derived dynamically from the CaptureSessionID at runtime, so one S2 stream is created per capture session within the configured basin
There was a problem hiding this comment.
this will make it hard for, e.g. the API to know what stream to read when it wants to stream telemetry for a browser. It also is at odds with "one global event stream" that the cdp stuff publishes into. I would make this a constant that is injected on startup
|
|
||
| var _ oapi.StrictServerInterface = (*ApiService)(nil) | ||
|
|
||
| // New constructs an ApiService. |
| resp.Status = oapi.CaptureSessionStatusStopped | ||
| // Session cleanup (Remove on the S2 producer) happens automatically in | ||
| // EventsStorageWriter.Run when it processes the SessionEnded event, ensuring | ||
| // all pending writes are flushed before the producer is torn down. |
There was a problem hiding this comment.
this comment feels out of place. remove or move closer to the things it's talking about?
|
|
||
| // EventsStorage is the durable storage interface for the storage writer. | ||
| type EventsStorage interface { | ||
| Append(ctx context.Context, streamName string, data []byte) error |
There was a problem hiding this comment.
i'd couple the storage to the envelope: Append(ctx, env Envelope) error.
| TypeEventsDropped = "events_dropped" | ||
| SessionEnded = "session_ended" | ||
| EventsDropped = "events_dropped" | ||
| EventsStorageError = "storage_error" |
There was a problem hiding this comment.
i don't think we should publish this as an event--create's a feedback loop and exposes details that make more sense as logs
…ies and fix delimiter in CategoryFor
… and category validation
…alloc in filewriter, fix session doc
…ed signal on Stop
…rflow as system event
…e gracefully on s2 init failure
… and RemoveSession
bf059ca to
8b34d4f
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.
| s.captureSessionID = captureSessionID | ||
| s.seq = 0 | ||
| s.createdAt = time.Now() | ||
| s.ring.reset() |
There was a problem hiding this comment.
Seq resets on session Start, breaking reconnection
High Severity
CaptureSession.Start() resets s.seq = 0 and calls s.ring.reset(), making sequence numbers session-scoped instead of process-monotonic. The rule requires seq to never reset on Start/Stop because a stale Last-Event-ID from a previous session could silently overlap with the new session's seqs, causing clients to skip events or receive wrong data. The old EventStream design kept seq process-monotonic across sessions. The SSE stream code even acknowledges this risk in a comment but proceeds anyway.
Additional Locations (1)
Triggered by learned rule: EventStream seq is process-monotonic; CaptureSession must not own the ring buffer or reset seq
Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.
| func (s *ApiService) PublishEvent(_ context.Context, req oapi.PublishEventRequestObject) (oapi.PublishEventResponseObject, error) { | ||
| if !s.captureSession.Active() { | ||
| return oapi.PublishEvent400JSONResponse{BadRequestErrorJSONResponse: oapi.BadRequestErrorJSONResponse{Message: "no active capture session"}}, nil | ||
| } |
There was a problem hiding this comment.
PublishEvent rejects requests without active session
Medium Severity
PublishEvent now returns 400 when no capture session is active. The rule requires publish endpoints to operate on the global stream regardless of session state. Previously, the endpoint published through s.eventStream.Publish() which always worked. Now it gates on s.captureSession.Active(), breaking callers that publish events before or between sessions.
Triggered by learned rule: EventStream seq is process-monotonic; CaptureSession must not own the ring buffer or reset seq
Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.
| mustFFmpeg() | ||
|
|
||
| stz := scaletozero.NewDebouncedControllerWithCooldown(scaletozero.NewUnikraftCloudController(), config.ScaleToZeroCooldown) | ||
| stz := scaletozero.NewDebouncedController(scaletozero.NewUnikraftCloudController()) |
There was a problem hiding this comment.
Scale-to-zero cooldown silently dropped
Medium Severity
NewDebouncedControllerWithCooldown was replaced with NewDebouncedController, which uses zero cooldown. The config.ScaleToZeroCooldown (default 1s) is now ignored despite still being defined in the config struct. This removes the debounce window that prevented rapid scale-to-zero toggling between sequential requests.
Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.
| return | ||
| } | ||
| data, _ := json.Marshal(p) | ||
| m.publishEvent(EventLayoutShift, events.CategoryPage, events.Source{Kind: events.KindCDP}, "PerformanceTimeline.timelineEventAdded", data, sessionID) |
There was a problem hiding this comment.
Layout shift data leaks raw CDP camelCase structure
Medium Severity
handleTimelineEvent now marshals the entire cdpPerformanceTimelineEventAddedParams struct directly (json.Marshal(p)), producing data like {"event":{"frameId":"...","layoutShiftDetails":...}}. This leaks raw CDP camelCase field names (frameId, layoutShiftDetails) and wraps everything in a spurious "event" key. The old code carefully projected fields into snake_case (source_frame_id, layout_shift_details).
Triggered by learned rule: CDP monitor events must use Kernel-owned snake_case schema, not raw CDP camelCase
Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.


Link to example S2 logs -> https://s2.dev/dashboard/basins/dev-capture-session-logs/studio?stream=gpy8681tmxg5j8zk69v5grzl
Note
Medium Risk
Adds a new durable event sink and rewires the event pipeline (capture sessions, SSE streaming, shutdown ordering), which could affect event delivery/ordering and shutdown behavior if misconfigured or if S2 backpressure/errors occur.
Overview
Adds optional S2-backed durable event storage: new
S2_*config/env vars,S2Storageimplementation (vias2-sdk-go), and anEventsStorageWriterthat drains the capture-session ring buffer into per-session S2 streams and evicts per-session producers onsession_ended.Refactors the API event pipeline to use the new
events.CaptureSession(file+ring publishing, per-session seq reset,capture_session_idadded to envelopes), updates/events/streamand/events/publishto operate only when a capture session is active (with reserved event-type protection), and ensures SSE clients receivesession_endedand the stream closes.Updates runtime wiring: propagates
S2_BASIN/S2_ACCESS_TOKENinto Docker/supervisord, starts/stops the storage writer inmainwith shutdown ordering to flushsession_endedbefore closing, and adjusts CDP monitor computed-event semantics/names (notablynavigation_settlednow requiresdom_content_loaded+network_idle+layout_settled) and simplifies emitted payloads.Reviewed by Cursor Bugbot for commit 8b34d4f. Bugbot is set up for automated code reviews on this repo. Configure here.