docs(go/plugins/googlegenai): document gemini-3.1 tts model behaviour and add sample#5497
docs(go/plugins/googlegenai): document gemini-3.1 tts model behaviour and add sample#5497IzaakGough wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request registers and documents new text-to-speech (TTS) models (gemini-2.5-flash-preview-tts, gemini-2.5-pro-preview-tts, and gemini-3.1-flash-tts-preview) in the Google Gen AI plugin, and provides helper functions for converting raw PCM audio to WAV format. Feedback on the changes points out a critical issue in the text-to-speech sample where initializing multiple Genkit instances (g1 and g2) in the same process can cause a panic due to duplicate action registration; instead, a single Genkit instance should be used with the model name specified explicitly in each flow.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Summary
Document the dedicated Gemini TTS models in the Go
googlegenaiplugin and add a sample showing how to use them, including the Gemini 3.1 PCM-to-WAV handling needed to produce playable output.Problem/Root Cause
The Go plugin exposed dedicated Gemini TTS model IDs, but the behavior and usage of those models were not documented. In particular,
gemini-3.1-flash-tts-previewreturns PCM audio rather than a directly playable WAV file, so users needed sample code showing how to decode the inline media response and wrap it in a WAV container.Solution/Changes
Testing
go/plugins/googlegenai/tts_test.gocovering TTS model registration and capabilities.go test ./plugins/googlegenai/...go build ./samples/text-to-speech