fix(LITE): base64-encode agent.speak audio chunks by mvtandas · Pull Request #103 · heygen-com/liveavatar-web-sdk

mvtandas · 2026-05-07T08:29:20Z

Proposed changes

repeatAudio() in LITE mode is currently broken end-to-end: the SDK sends raw "binary string" PCM, but the LiveAvatar server expects base64-encoded PCM 16-bit 24 kHz mono. The server rejects every chunk with

{ "type": "error",
  "error": { "type": "invalid_request_error", "message": "invalid base64 audio data" } }

These error events are silently dropped because handleWebSocketMessage only forwards agent.speak_started / agent.speak_ended — so callers see no lip-sync, no audio, and no AVATAR_SPEAK_STARTED / AVATAR_SPEAK_ENDED events. This matches the LITE-mode events docs which say the wire format is base64.

This PR btoa()s each audio chunk inside sendCommandEventToWebSocket (one line). repeatAudio(audio: string)'s public contract — "raw 16-bit signed PCM" as a binary string — is unchanged; the encoding now happens inside the SDK on the way out, where it always belonged.

Why this is the right place to fix it

splitPcm24kStringToChunks operates on raw PCM bytes (it slices on byte boundaries — 19200 / 48000 bytes per chunk). Doing btoa after chunking keeps that logic correct; doing it before would force callers to pass base64 and would break the chunk-size math.

Testing

Extended the existing "sends speak audio command event via web socket" test to assert the audio field equals btoa(input). Full suite: 102 passed (102). Lint clean.

I verified the fix end-to-end against a real LITE session: AVATAR_SPEAK_STARTED / AVATAR_SPEAK_ENDED fire as expected and the avatar lip-syncs correctly.

Fixes #92.

The LiveAvatar server expects the `audio` field of `agent.speak` events to be base64-encoded PCM 16-bit 24 kHz mono (per docs.liveavatar.com/docs/ lite-mode/events). The SDK was sending the raw "binary string" produced by `splitPcm24kStringToChunks()`, which the server rejects with `{type: "error", error: {type: "invalid_request_error", message: "invalid base64 audio data"}}` for every chunk. The errors are silently dropped because `handleWebSocketMessage` only emits `agent.speak_started` / `agent.speak_ended`, so callers see no lip-sync, no audio, and no events. `repeatAudio()`'s public contract (raw 16-bit signed PCM as a binary string) is unchanged — the encoding now happens inside the SDK on the way out, where it always belonged. Fixes heygen-com#92

mvtandas mentioned this pull request May 7, 2026

repeatAudio() does not produce lip-sync or audio for custom VIDEO avatars in LITE mode #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(LITE): base64-encode agent.speak audio chunks#103

fix(LITE): base64-encode agent.speak audio chunks#103
mvtandas wants to merge 1 commit intoheygen-com:masterfrom
mvtandas:fix/lite-mode-base64-audio

mvtandas commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvtandas commented May 7, 2026

Proposed changes

Why this is the right place to fix it

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant