[codex] Add realtime agents support#4543
Draft
samwillis wants to merge 25 commits into
Draft
Conversation
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
3ffd31d to
d54bb62
Compare
Contributor
❌ 2 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Contributor
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Horton realtime voice mode on top of Electric Agents using durable streams as the client/server IO path and OpenAI Realtime as the initial provider. The implementation lets Horton drop into an active realtime session from either an existing conversation or the new-session screen, receive durable microphone audio, stream assistant audio back, keep typed messages usable during voice mode, run the existing tool loop, and persist transcripts/audio stream metadata so sessions can be replayed later.
This PR is still a draft, but it now includes the end-to-end runtime, server, UI, OpenAI provider, timeline, and test coverage needed to exercise realtime mode in the desktop app.
Why
The goal is not just a WebSocket passthrough to OpenAI. Electric Agents needs a reusable realtime API for app builders where audio/control IO is durable, inspectable, and eventually replayable. The server/runtime owns provider connections and tool execution; clients write/read durable streams for audio and control frames. OpenAI is the first provider, with the runtime API leaving room for other providers later.
Architecture
audio_in: durable PCM audio from browser/client to runtime.audio_out: durable PCM assistant audio from runtime/provider to client.control_in: durable JSON commands from client to runtime.control_out: durable JSON provider/runtime events from runtime to client.ctx.useRealtime(...)instead of the normal text-onlyctx.useAgent(...)path.Runtime and SDK API
RealtimeTurnDetectionConfig, supporting:server_vadsemantic_vadfalseor{ type: "none" }inputTranscription.delayso callers can request low-latency streaming transcription when the provider supports it.@electric-ax/agents-runtimefor app builders.OpenAI Realtime Provider
session.updatepayloads with nestedsession.audio.input/session.audio.outputconfig.RealtimeProviderEvents.event_idwhere available.conversation.item.truncatefor WebSocket playback interruption.Horton Behavior
server_vadfor turn detection:0.55300ms500msgpt-realtime-whisperwithdelay: "minimal"for input transcription so user transcript deltas stream while the user is still speaking, rather than only after the turn ends.Durable Stream IO
input_audio.commitcommands in provider-VAD mode.audio_inchunks directly to OpenAI.control_outcarries lightweight JSON event summaries for UI state, while raw assistant PCM goes toaudio_out.Browser Audio Capture and Playback
AudioWorkletfor lower-overhead microphone capture where available, with aScriptProcessorNodefallback.input_audio_buffer.speech_startedevents.response.cancel.UI Flow
Transcript and Timeline Handling
textDeltaschunks instead of repeatedly rewriting the whole transcript text over the wire.Server Routing
producer-seqheader in CORS/preflight handling so durable stream writes from the UI succeed.Error Handling and Reliability Fixes
response_cancel_not_activewhen they are caused by a stale or already-cancelled response.Validation
Ran the focused checks under the repo-pinned Node 24.11.1 toolchain:
pnpm --dir packages/agents-runtime exec tsc --noEmit --pretty falsepnpm --dir packages/agents-server-ui exec tsc --noEmit --pretty falsepnpm --dir packages/agents exec tsc --noEmit --pretty falsepnpm --dir packages/agents-runtime exec vitest run test/openai-realtime.test.ts test/realtime-context.test.tspnpm --dir packages/agents-runtime exec vitest run test/realtime-context.test.ts test/openai-realtime.test.ts test/timeline-context.test.ts test/entity-timeline.test.tspnpm --dir packages/agents exec vitest run test/horton-tool-composition.test.tspnpm --dir packages/agents-runtime run buildgit diff --checkAlso tested manually in the desktop app through multiple realtime voice sessions, including:
Notes and Follow-ups