fix(LITE): base64-encode agent.speak audio chunks#103
Open
mvtandas wants to merge 1 commit intoheygen-com:masterfrom
Open
fix(LITE): base64-encode agent.speak audio chunks#103mvtandas wants to merge 1 commit intoheygen-com:masterfrom
mvtandas wants to merge 1 commit intoheygen-com:masterfrom
Conversation
The LiveAvatar server expects the `audio` field of `agent.speak` events to
be base64-encoded PCM 16-bit 24 kHz mono (per docs.liveavatar.com/docs/
lite-mode/events). The SDK was sending the raw "binary string" produced
by `splitPcm24kStringToChunks()`, which the server rejects with
`{type: "error", error: {type: "invalid_request_error",
message: "invalid base64 audio data"}}` for every chunk. The errors are
silently dropped because `handleWebSocketMessage` only emits
`agent.speak_started` / `agent.speak_ended`, so callers see no lip-sync,
no audio, and no events.
`repeatAudio()`'s public contract (raw 16-bit signed PCM as a binary
string) is unchanged — the encoding now happens inside the SDK on the
way out, where it always belonged.
Fixes heygen-com#92
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
repeatAudio()in LITE mode is currently broken end-to-end: the SDK sends raw "binary string" PCM, but the LiveAvatar server expects base64-encoded PCM 16-bit 24 kHz mono. The server rejects every chunk with{ "type": "error", "error": { "type": "invalid_request_error", "message": "invalid base64 audio data" } }These error events are silently dropped because
handleWebSocketMessageonly forwardsagent.speak_started/agent.speak_ended— so callers see no lip-sync, no audio, and noAVATAR_SPEAK_STARTED/AVATAR_SPEAK_ENDEDevents. This matches the LITE-mode events docs which say the wire format is base64.This PR
btoa()s each audio chunk insidesendCommandEventToWebSocket(one line).repeatAudio(audio: string)'s public contract — "raw 16-bit signed PCM" as a binary string — is unchanged; the encoding now happens inside the SDK on the way out, where it always belonged.Why this is the right place to fix it
splitPcm24kStringToChunksoperates on raw PCM bytes (it slices on byte boundaries — 19200 / 48000 bytes per chunk). Doingbtoaafter chunking keeps that logic correct; doing it before would force callers to pass base64 and would break the chunk-size math.Testing
Extended the existing
"sends speak audio command event via web socket"test to assert theaudiofield equalsbtoa(input). Full suite:102 passed (102). Lint clean.I verified the fix end-to-end against a real LITE session:
AVATAR_SPEAK_STARTED/AVATAR_SPEAK_ENDEDfire as expected and the avatar lip-syncs correctly.Fixes #92.