Skip to content

[kernel-1116] browser events add s2 storage support#235

Open
archandatta wants to merge 71 commits into
mainfrom
archand/kernel-1116/browser-events/s2
Open

[kernel-1116] browser events add s2 storage support#235
archandatta wants to merge 71 commits into
mainfrom
archand/kernel-1116/browser-events/s2

Conversation

@archandatta
Copy link
Copy Markdown
Contributor

@archandatta archandatta commented May 6, 2026

Link to example S2 logs -> https://s2.dev/dashboard/basins/dev-capture-session-logs/studio?stream=gpy8681tmxg5j8zk69v5grzl


Note

Medium Risk
Adds a new durable event sink and rewires the event pipeline (capture sessions, SSE streaming, shutdown ordering), which could affect event delivery/ordering and shutdown behavior if misconfigured or if S2 backpressure/errors occur.

Overview
Adds optional S2-backed durable event storage: new S2_* config/env vars, S2Storage implementation (via s2-sdk-go), and an EventsStorageWriter that drains the capture-session ring buffer into per-session S2 streams and evicts per-session producers on session_ended.

Refactors the API event pipeline to use the new events.CaptureSession (file+ring publishing, per-session seq reset, capture_session_id added to envelopes), updates /events/stream and /events/publish to operate only when a capture session is active (with reserved event-type protection), and ensures SSE clients receive session_ended and the stream closes.

Updates runtime wiring: propagates S2_BASIN/S2_ACCESS_TOKEN into Docker/supervisord, starts/stops the storage writer in main with shutdown ordering to flush session_ended before closing, and adjusts CDP monitor computed-event semantics/names (notably navigation_settled now requires dom_content_loaded + network_idle + layout_settled) and simplifies emitted payloads.

Reviewed by Cursor Bugbot for commit 8b34d4f. Bugbot is set up for automated code reviews on this repo. Configure here.

@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR title and branch name suggest feature work on browser events/s2 storage, but no file changes are visible to confirm modifications to API endpoints or Temporal workflows.

To monitor this PR anyway, reply with @firetiger monitor this.

@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 6, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedgolang/​github.com/​s2-streamstore/​s2-sdk-go@​v0.14.091100100100100

View full report

Comment thread server/cmd/api/api/events.go Outdated
Comment thread server/cmd/api/api/api.go Outdated
Comment thread server/cmd/api/main.go
Comment thread server/lib/events/eventsstorage.go
@archandatta archandatta requested review from Sayan- and rgarcia May 6, 2026 19:12
Copy link
Copy Markdown
Contributor

@rgarcia rgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to resolve #227 first and get a global event stream in place/moving ring/seq off capturesession

and then i think this pr is more reviewable

also in a similar vein instead of one s2 writer per capture session i think there's one global s2 writer that is reading from the global event stream

@@ -1,5 +1,5 @@
[program:kernel-images-api]
command=/bin/bash -lc 'mkdir -p "${KERNEL_IMAGES_API_OUTPUT_DIR:-/recordings}" && PORT="${KERNEL_IMAGES_API_PORT:-10001}" FRAME_RATE="${KERNEL_IMAGES_API_FRAME_RATE:-10}" DISPLAY_NUM="${KERNEL_IMAGES_API_DISPLAY_NUM:-${DISPLAY_NUM:-1}}" MAX_SIZE_MB="${KERNEL_IMAGES_API_MAX_SIZE_MB:-500}" OUTPUT_DIR="${KERNEL_IMAGES_API_OUTPUT_DIR:-/recordings}" LOG_CDP_MESSAGES="${LOG_CDP_MESSAGES:-false}" exec /usr/local/bin/kernel-images-api'
command=/bin/bash -lc 'mkdir -p "${KERNEL_IMAGES_API_OUTPUT_DIR:-/recordings}" && PORT="${KERNEL_IMAGES_API_PORT:-10001}" FRAME_RATE="${KERNEL_IMAGES_API_FRAME_RATE:-10}" DISPLAY_NUM="${KERNEL_IMAGES_API_DISPLAY_NUM:-${DISPLAY_NUM:-1}}" MAX_SIZE_MB="${KERNEL_IMAGES_API_MAX_SIZE_MB:-500}" OUTPUT_DIR="${KERNEL_IMAGES_API_OUTPUT_DIR:-/recordings}" LOG_CDP_MESSAGES="${LOG_CDP_MESSAGES:-false}" S2_BASIN="${S2_BASIN:-}" S2_ACCESS_TOKEN="${S2_ACCESS_TOKEN:-}" exec /usr/local/bin/kernel-images-api'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there also be an s2 stream config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream names are derived dynamically from the CaptureSessionID at runtime, so one S2 stream is created per capture session within the configured basin

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will make it hard for, e.g. the API to know what stream to read when it wants to stream telemetry for a browser. It also is at odds with "one global event stream" that the cdp stuff publishes into. I would make this a constant that is injected on startup

Comment thread server/cmd/api/api/api.go

var _ oapi.StrictServerInterface = (*ApiService)(nil)

// New constructs an ApiService.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

resp.Status = oapi.CaptureSessionStatusStopped
// Session cleanup (Remove on the S2 producer) happens automatically in
// EventsStorageWriter.Run when it processes the SessionEnded event, ensuring
// all pending writes are flushed before the producer is torn down.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment feels out of place. remove or move closer to the things it's talking about?


// EventsStorage is the durable storage interface for the storage writer.
type EventsStorage interface {
Append(ctx context.Context, streamName string, data []byte) error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd couple the storage to the envelope: Append(ctx, env Envelope) error.

TypeEventsDropped = "events_dropped"
SessionEnded = "session_ended"
EventsDropped = "events_dropped"
EventsStorageError = "storage_error"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should publish this as an event--create's a feedback loop and exposes details that make more sense as logs

Base automatically changed from archand/kernel-1116/external-events to main May 11, 2026 17:02
@archandatta archandatta force-pushed the archand/kernel-1116/browser-events/s2 branch from bf059ca to 8b34d4f Compare May 11, 2026 18:51
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.

s.captureSessionID = captureSessionID
s.seq = 0
s.createdAt = time.Now()
s.ring.reset()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seq resets on session Start, breaking reconnection

High Severity

CaptureSession.Start() resets s.seq = 0 and calls s.ring.reset(), making sequence numbers session-scoped instead of process-monotonic. The rule requires seq to never reset on Start/Stop because a stale Last-Event-ID from a previous session could silently overlap with the new session's seqs, causing clients to skip events or receive wrong data. The old EventStream design kept seq process-monotonic across sessions. The SSE stream code even acknowledges this risk in a comment but proceeds anyway.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by learned rule: EventStream seq is process-monotonic; CaptureSession must not own the ring buffer or reset seq

Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.

func (s *ApiService) PublishEvent(_ context.Context, req oapi.PublishEventRequestObject) (oapi.PublishEventResponseObject, error) {
if !s.captureSession.Active() {
return oapi.PublishEvent400JSONResponse{BadRequestErrorJSONResponse: oapi.BadRequestErrorJSONResponse{Message: "no active capture session"}}, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PublishEvent rejects requests without active session

Medium Severity

PublishEvent now returns 400 when no capture session is active. The rule requires publish endpoints to operate on the global stream regardless of session state. Previously, the endpoint published through s.eventStream.Publish() which always worked. Now it gates on s.captureSession.Active(), breaking callers that publish events before or between sessions.

Fix in Cursor Fix in Web

Triggered by learned rule: EventStream seq is process-monotonic; CaptureSession must not own the ring buffer or reset seq

Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.

Comment thread server/cmd/api/main.go
mustFFmpeg()

stz := scaletozero.NewDebouncedControllerWithCooldown(scaletozero.NewUnikraftCloudController(), config.ScaleToZeroCooldown)
stz := scaletozero.NewDebouncedController(scaletozero.NewUnikraftCloudController())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scale-to-zero cooldown silently dropped

Medium Severity

NewDebouncedControllerWithCooldown was replaced with NewDebouncedController, which uses zero cooldown. The config.ScaleToZeroCooldown (default 1s) is now ignored despite still being defined in the config struct. This removes the debounce window that prevented rapid scale-to-zero toggling between sequential requests.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.

return
}
data, _ := json.Marshal(p)
m.publishEvent(EventLayoutShift, events.CategoryPage, events.Source{Kind: events.KindCDP}, "PerformanceTimeline.timelineEventAdded", data, sessionID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout shift data leaks raw CDP camelCase structure

Medium Severity

handleTimelineEvent now marshals the entire cdpPerformanceTimelineEventAddedParams struct directly (json.Marshal(p)), producing data like {"event":{"frameId":"...","layoutShiftDetails":...}}. This leaks raw CDP camelCase field names (frameId, layoutShiftDetails) and wraps everything in a spurious "event" key. The old code carefully projected fields into snake_case (source_frame_id, layout_shift_details).

Fix in Cursor Fix in Web

Triggered by learned rule: CDP monitor events must use Kernel-owned snake_case schema, not raw CDP camelCase

Reviewed by Cursor Bugbot for commit 8b34d4f. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants