Skip to content

docs(go/plugins/googlegenai): document gemini-3.1 tts model behaviour and add sample#5497

Open
IzaakGough wants to merge 6 commits into
mainfrom
docs-gemini-3.1-flash-tts-behaviour
Open

docs(go/plugins/googlegenai): document gemini-3.1 tts model behaviour and add sample#5497
IzaakGough wants to merge 6 commits into
mainfrom
docs-gemini-3.1-flash-tts-behaviour

Conversation

@IzaakGough

@IzaakGough IzaakGough commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Document the dedicated Gemini TTS models in the Go googlegenai plugin and add a sample showing how to use them, including the Gemini 3.1 PCM-to-WAV handling needed to produce playable output.

Problem/Root Cause

The Go plugin exposed dedicated Gemini TTS model IDs, but the behavior and usage of those models were not documented. In particular, gemini-3.1-flash-tts-preview returns PCM audio rather than a directly playable WAV file, so users needed sample code showing how to decode the inline media response and wrap it in a WAV container.

Solution/Changes

  • Register the dedicated Gemini TTS models with TTS-specific capabilities in the Google AI model registry.
  • Add tests covering model classification, declared capabilities, and Google AI-only registration.
  • Expand the plugin README with TTS model documentation, usage examples, and a note describing the Gemini 3.1 PCM output behavior.
  • Update the Go text-to-speech sample to:
    • return inline media for the Gemini 2.5 TTS flow
    • add a Gemini 3.1 flow that decodes inline PCM audio, converts it to WAV, and returns the generated file path

Testing

  • Added go/plugins/googlegenai/tts_test.go covering TTS model registration and capabilities.
  • Ran go test ./plugins/googlegenai/...
  • Ran go build ./samples/text-to-speech

@github-actions github-actions Bot added docs Improvements or additions to documentation go labels Jun 9, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request registers and documents new text-to-speech (TTS) models (gemini-2.5-flash-preview-tts, gemini-2.5-pro-preview-tts, and gemini-3.1-flash-tts-preview) in the Google Gen AI plugin, and provides helper functions for converting raw PCM audio to WAV format. Feedback on the changes points out a critical issue in the text-to-speech sample where initializing multiple Genkit instances (g1 and g2) in the same process can cause a panic due to duplicate action registration; instead, a single Genkit instance should be used with the model name specified explicitly in each flow.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread go/samples/text-to-speech/main.go Outdated
@IzaakGough IzaakGough changed the title Document gemini 3.1 TTS model behaviour and add sample code docs: document gemini-3.1 tts model behaviour and add sample Jun 9, 2026
@IzaakGough IzaakGough changed the title docs: document gemini-3.1 tts model behaviour and add sample docs(go/plugins/googlegenai): document gemini-3.1 tts model behaviour and add sample Jun 9, 2026
@IzaakGough IzaakGough marked this pull request as ready for review June 9, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation go

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants