[codex] Add realtime agents support by samwillis · Pull Request #4543 · electric-sql/electric

samwillis · 2026-06-09T14:40:35Z

Summary

Adds Horton realtime voice mode on top of Electric Agents using durable streams as the client/server IO path and OpenAI Realtime as the initial provider. The implementation lets Horton drop into an active realtime session from either an existing conversation or the new-session screen, receive durable microphone audio, stream assistant audio back, keep typed messages usable during voice mode, run the existing tool loop, and persist transcripts/audio stream metadata so sessions can be replayed later.

This PR is still a draft, but it now includes the end-to-end runtime, server, UI, OpenAI provider, timeline, and test coverage needed to exercise realtime mode in the desktop app.

Why

The goal is not just a WebSocket passthrough to OpenAI. Electric Agents needs a reusable realtime API for app builders where audio/control IO is durable, inspectable, and eventually replayable. The server/runtime owns provider connections and tool execution; clients write/read durable streams for audio and control frames. OpenAI is the first provider, with the runtime API leaving room for other providers later.

Architecture

Clients create realtime sessions through the agents server and receive stream refs for:
- audio_in: durable PCM audio from browser/client to runtime.
- audio_out: durable PCM assistant audio from runtime/provider to client.
- control_in: durable JSON commands from client to runtime.
- control_out: durable JSON provider/runtime events from runtime to client.
The runtime bridges these durable streams to a provider session.
Horton continues to use its normal context/tool stack. In realtime mode, Horton configures ctx.useRealtime(...) instead of the normal text-only ctx.useAgent(...) path.
Tool calls still run through the composed Electric/Pi tool system, with Horton’s realtime policy allowing direct use of safe tools and worker delegation where appropriate.
Realtime sessions are recorded in the manifest with durable stream refs, so the UI can discover active/replayable sessions.

Runtime and SDK API

Adds first-class realtime runtime types and provider hooks for:
- realtime audio formats
- input transcription config
- OpenAI/server VAD config
- realtime provider events
- tool call streaming/results
- transcript callbacks
Adds RealtimeTurnDetectionConfig, supporting:
- server_vad
- semantic_vad
- explicit manual mode via false or { type: "none" }
Adds inputTranscription.delay so callers can request low-latency streaming transcription when the provider supports it.
Exports the new realtime types from @electric-ax/agents-runtime for app builders.

OpenAI Realtime Provider

Adds OpenAI Realtime WebSocket provider support.
Sends GA-style session.update payloads with nested session.audio.input / session.audio.output config.
Maps OpenAI audio, transcript, response, error, and tool events into runtime RealtimeProviderEvents.
Supports provider tools/function calls and sends tool results back to the active realtime response.
Handles cancellation races by avoiding stale tool-result responses after response cancellation/epoch changes.
Deduplicates provider events by event_id where available.
Supports output audio truncation through conversation.item.truncate for WebSocket playback interruption.
Keeps manual input commit support for future push-to-talk or non-VAD providers.

Horton Behavior

Horton realtime mode currently supports OpenAI only.
Horton uses OpenAI server_vad for turn detection:
- threshold 0.55
- prefix padding 300ms
- silence duration 500ms
- automatic response creation enabled
- provider-side interruption enabled
Horton uses gpt-realtime-whisper with delay: "minimal" for input transcription so user transcript deltas stream while the user is still speaking, rather than only after the turn ends.
Typed text during an active realtime session routes into the realtime provider session instead of starting a separate text run.

Durable Stream IO

Audio and control IO use durable streams with write batching enabled.
The browser no longer sends manual input_audio.commit commands in provider-VAD mode.
The runtime bridge has two input modes:
- provider VAD mode streams durable audio_in chunks directly to OpenAI.
- manual commit mode buffers exact byte ranges and commits only requested audio spans.
Short manual audio commits below OpenAI’s minimum input size are skipped and cleared to avoid provider errors.
control_out carries lightweight JSON event summaries for UI state, while raw assistant PCM goes to audio_out.

Browser Audio Capture and Playback

Uses AudioWorklet for lower-overhead microphone capture where available, with a ScriptProcessorNode fallback.
Captures mono 24 kHz PCM16 for OpenAI realtime.
Adds a local transport gate to reduce durable stream write volume while still relying on OpenAI VAD for actual turn detection:
- keeps a short pre-roll buffer so speech starts are not clipped
- sends active speech chunks
- sends a trailing silence tail so provider VAD can detect the end of the turn
- sends nothing while idle
Does not locally cancel assistant responses on local gate-open. Interruption is driven by OpenAI input_audio_buffer.speech_started events.
On provider speech-start events, the UI stops queued playback and sends truncation metadata without also sending a redundant response.cancel.
Fixes PCM16 output chunk alignment so odd or split byte chunks do not create static/noise.
Adds a realtime voice control using a non-dictation icon and a live input-level visual indicator.

UI Flow

Adds realtime start/stop controls to the normal message input.
Keeps the prompt send button available during realtime mode so users can type while voice mode is active.
Adds a realtime button to the new Horton session screen. Clicking it creates a new session and navigates into it.
Realtime session status and stream refs are represented in the timeline/manifest rather than hidden client-only state.

Transcript and Timeline Handling

Persists realtime input/output transcripts as first-class realtime transcript rows.
Streams transcript text as textDeltas chunks instead of repeatedly rewriting the whole transcript text over the wire.
Interleaves realtime transcripts, normal assistant output, and tool calls in visible timeline order.
Splits assistant output transcript segments around later user speech so long assistant streams do not visually float above later user turns.
Uses OpenAI item/response identifiers to group input/output transcript deltas.
Reconciles final transcript text against streamed deltas without duplicating content.
Avoids seeding active-session realtime transcripts back into provider history when reconnecting the active session.
Generates session titles/descriptions from the first finalized user realtime transcript.

Server Routing

Adds realtime session start plumbing in agents-server.
Adds stream routing support for realtime audio/control streams.
Allows the producer-seq header in CORS/preflight handling so durable stream writes from the UI succeed.

Error Handling and Reliability Fixes

Ignores OpenAI inactive cancellation errors such as response_cancel_not_active when they are caused by a stale or already-cancelled response.
Ignores stale output truncation errors when the local client’s playback/truncation state is behind provider state.
Avoids committing empty or too-short provider input buffers in manual mode.
Prevents duplicate transcript rows by streaming deltas under stable realtime transcript IDs.
Keeps response/tool result sending guarded by response epoch so cancelled/replaced responses do not get stale tool follow-ups.

Validation

Ran the focused checks under the repo-pinned Node 24.11.1 toolchain:

pnpm --dir packages/agents-runtime exec tsc --noEmit --pretty false
pnpm --dir packages/agents-server-ui exec tsc --noEmit --pretty false
pnpm --dir packages/agents exec tsc --noEmit --pretty false
pnpm --dir packages/agents-runtime exec vitest run test/openai-realtime.test.ts test/realtime-context.test.ts
pnpm --dir packages/agents-runtime exec vitest run test/realtime-context.test.ts test/openai-realtime.test.ts test/timeline-context.test.ts test/entity-timeline.test.ts
pnpm --dir packages/agents exec vitest run test/horton-tool-composition.test.ts
pnpm --dir packages/agents-runtime run build
git diff --check

Also tested manually in the desktop app through multiple realtime voice sessions, including:

initial realtime session startup
microphone input streaming
assistant audio playback
live input-level indicator
typed messages during realtime mode
tool calls from realtime mode
realtime transcript rendering/order
long user turns with live transcript deltas
interruption behavior while assistant audio is playing

Notes and Follow-ups

This PR intentionally starts with OpenAI Realtime only, but the runtime/provider boundary is not OpenAI-specific.
Replay/scrub UI is not implemented here, but durable session stream refs are persisted in the manifest so a replay UI can be built on top.
Mobile/native scoped stream token refresh remains out of scope for this draft.
Risk area to keep testing: long-running voice sessions with many interruptions and tool calls, because those stress provider response cancellation, transcript reconciliation, and timeline ordering together.

netlify · 2026-06-09T15:03:54Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`242aca6`
🔍 Latest deploy log	https://app.netlify.com/projects/electric-next/deploys/6a2825e8c87469000860818d
😎 Deploy Preview	https://deploy-preview-4543--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-06-09T18:40:24Z

Electric Agents Desktop Builds

Build artifacts for commit d54bb62.

Platform	Status	Artifact
macOS Apple Silicon	Passed	DMG
macOS Intel	Passed	DMG
Windows x64	Passed	Installer
Linux x64	Passed	AppImage / deb

Workflow run

codecov · 2026-06-09T18:41:30Z

❌ 2 Tests Failed:

Tests completed	Failed	Passed	Skipped
1168	2	1166	41

View the top 2 failed test(s) by shortest run time

test/process-wake.test.ts > processWake > applies SIGINT that arrives before the handler run controller is created

Stack Traces | 0.114s run time

Error: [agent-runtime] entity timeline requires collection "realtimeTranscripts" but it was not registered
 ❯ getOrderableCollection src/entity-timeline.ts:623:11
 ❯ Module.buildEntityTimelineData src/entity-timeline.ts:1048:5
 ❯ timelineMessages src/timeline-context.ts:519:37
 ❯ Module.timelineToMessages src/timeline-context.ts:540:10
 ❯ Object.run src/context-factory.ts:1398:39
 ❯ Object.handler test/process-wake.test.ts:1067:9
 ❯ Module.processWake src/process-wake.ts:2147:9
 ❯ test/process-wake.test.ts:1089:5

test/process-wake.test.ts > processWake > aborts an active run for server-handled SIGKILL without rewriting the signal

Stack Traces | 5s run time

Error: Test timed out in 5000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/process-wake.test.ts:1183:3

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

github-actions · 2026-06-09T18:44:17Z

Electric Agents Mobile Build

Local mobile checks ran for commit d54bb62.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

netlify · 2026-06-09T19:08:02Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`d54bb62`
🔍 Latest deploy log	https://app.netlify.com/projects/electric-next/deploys/6a285def22d91d0008c570f8
😎 Deploy Preview	https://deploy-preview-4543--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

samwillis added 25 commits June 9, 2026 19:31

feat(agents): add realtime stream foundations

d7a3798

feat(agents-runtime): add realtime handler API

ce2ba48

feat(agents-server): add realtime session route

213d197

feat(agents-runtime): add realtime session client

da23086

feat(agents-runtime): add openai realtime provider

19596db

feat(agents-runtime): bridge realtime durable streams

6134e04

feat(agents): route horton realtime sessions

cd7747a

feat(agents-ui): add realtime voice toggle

6b41d71

feat(agents-ui): route realtime text input

ff1b1eb

fix(agents): harden realtime session lifecycle

1ac8444

fix(agents): make realtime voice input activate reliably

f980ea1

fix(agents): avoid inactive realtime response cancel

b5fe6c3

fix(agents): use supported OpenAI realtime model

45ea73b

fix(agents): wire realtime audio path

9ef1763

fix(agents): clamp realtime audio truncation

1c64549

feat(agents): persist realtime transcripts

2f2449c

feat(agents-ui): start realtime from spawn screen

28ecd7f

fix(agents): anchor realtime transcripts at speech start

0ecaf16

fix(agents-ui): keep send button in realtime mode

5f6b3fe

fix(agents): interleave realtime transcripts

68e55d7

fix(agents): order realtime tool runs by visible items

eb808a2

fix(agents): batch realtime durable stream appends

7c90803

fix(agents): capture realtime audio in worklet

1bebe6d

fix(agents): title realtime sessions from user transcript

f42df95

Improve Horton realtime audio streaming

d54bb62

samwillis force-pushed the codex/realtime-agents-plan branch from 3ffd31d to d54bb62 Compare June 9, 2026 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add realtime agents support#4543

[codex] Add realtime agents support#4543
samwillis wants to merge 25 commits into
mainfrom
codex/realtime-agents-plan

samwillis commented Jun 9, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

netlify Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samwillis commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Architecture

Runtime and SDK API

OpenAI Realtime Provider

Horton Behavior

Durable Stream IO

Browser Audio Capture and Playback

UI Flow

Transcript and Timeline Handling

Server Routing

Error Handling and Reliability Fixes

Validation

Notes and Follow-ups

Uh oh!

netlify Bot commented Jun 9, 2026

✅ Deploy Preview for electric-next ready!

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Electric Agents Desktop Builds

Uh oh!

codecov Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 2 Tests Failed:

Uh oh!

github-actions Bot commented Jun 9, 2026

Electric Agents Mobile Build

Uh oh!

netlify Bot commented Jun 9, 2026

✅ Deploy Preview for electric-next ready!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samwillis commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

codecov Bot commented Jun 9, 2026 •

edited

Loading