Skip to content

fix(LITE): base64-encode agent.speak audio chunks#103

Open
mvtandas wants to merge 1 commit intoheygen-com:masterfrom
mvtandas:fix/lite-mode-base64-audio
Open

fix(LITE): base64-encode agent.speak audio chunks#103
mvtandas wants to merge 1 commit intoheygen-com:masterfrom
mvtandas:fix/lite-mode-base64-audio

Conversation

@mvtandas
Copy link
Copy Markdown

@mvtandas mvtandas commented May 7, 2026

Proposed changes

repeatAudio() in LITE mode is currently broken end-to-end: the SDK sends raw "binary string" PCM, but the LiveAvatar server expects base64-encoded PCM 16-bit 24 kHz mono. The server rejects every chunk with

{ "type": "error",
  "error": { "type": "invalid_request_error", "message": "invalid base64 audio data" } }

These error events are silently dropped because handleWebSocketMessage only forwards agent.speak_started / agent.speak_ended — so callers see no lip-sync, no audio, and no AVATAR_SPEAK_STARTED / AVATAR_SPEAK_ENDED events. This matches the LITE-mode events docs which say the wire format is base64.

This PR btoa()s each audio chunk inside sendCommandEventToWebSocket (one line). repeatAudio(audio: string)'s public contract — "raw 16-bit signed PCM" as a binary string — is unchanged; the encoding now happens inside the SDK on the way out, where it always belonged.

Why this is the right place to fix it

splitPcm24kStringToChunks operates on raw PCM bytes (it slices on byte boundaries — 19200 / 48000 bytes per chunk). Doing btoa after chunking keeps that logic correct; doing it before would force callers to pass base64 and would break the chunk-size math.

Testing

Extended the existing "sends speak audio command event via web socket" test to assert the audio field equals btoa(input). Full suite: 102 passed (102). Lint clean.

I verified the fix end-to-end against a real LITE session: AVATAR_SPEAK_STARTED / AVATAR_SPEAK_ENDED fire as expected and the avatar lip-syncs correctly.

Fixes #92.

The LiveAvatar server expects the `audio` field of `agent.speak` events to
be base64-encoded PCM 16-bit 24 kHz mono (per docs.liveavatar.com/docs/
lite-mode/events). The SDK was sending the raw "binary string" produced
by `splitPcm24kStringToChunks()`, which the server rejects with
`{type: "error", error: {type: "invalid_request_error",
message: "invalid base64 audio data"}}` for every chunk. The errors are
silently dropped because `handleWebSocketMessage` only emits
`agent.speak_started` / `agent.speak_ended`, so callers see no lip-sync,
no audio, and no events.

`repeatAudio()`'s public contract (raw 16-bit signed PCM as a binary
string) is unchanged — the encoding now happens inside the SDK on the
way out, where it always belonged.

Fixes heygen-com#92
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

repeatAudio() does not produce lip-sync or audio for custom VIDEO avatars in LITE mode

1 participant