This is a collection of open source compatible voice samples mostly from public domain and Creative Commons works, which are suitable for use with zero-shot text to speech engines.
The primary goal of this repository is to provide high quality voice samples that are ready to use, as-is, with zero-shot TTS engines like Chatterbox and Pocket TTS.
The secondary goal is to provide a very clear trail from the final voice samples all the way back to not only the voice actor, but the original source files, for the sake of giving credit where credit is due. That's not always possible, because some voice datasets seem to be anonymized, or have not been tracked very well, but whenever it can be done, that data will be provided.
A variety of tools were used to clean up the source data as much as possible, including the following:
- Audacity
- The built-in noise remover can be handy
- Very rarely, this is also used to make slight speed changes
- Via tempo changes
- Kanade Tokenizer
- Amazing tool that operates in two modes:
- Voice Conversion
- Voice Resynthesis
- This analyzes the voice, producing an ideal, noise-free version
- That is then used to voice-convert the original sample, removing noise, which even works on reverb!
- It's extremely fast, especially compared to the Chatterbox voice converter
- Amazing tool that operates in two modes:
- SoX
- Involved in scripting the other tools, mostly for on the fly format conversion
- Resemble Enhance
- AI noise remover and audio up-scaler
- This is used as the final step for each voice sample
- It does a good job removing the noise Chatterbox tends to introduce
- RNNoise
- AI noise remover
- Works quite well with VAD% set to 99
- The default of 50% is terribly ridiculous
The voices directory holds audio samples from LibriVox.org and Archive.org, which have been noise reduced and trimmed to between seven and roughly twelve seconds, a length that works well for zero-shot TTS engines. This directory will always hold CC0-licensed samples.
The voices-emotion directory holds synthetic audio samples produced by using samples from the voices directory with Chatterbox, to produce emotional variations.
If there are ever samples under other licenses, they will be placed in other directories, to keep the licensing issues clear and easy to work with.
For the curious, here's the current process used for cleaning up new voice samples, before they're fed into Chatterbox:
- Import into Audacity
- Trim sample to between 1 and 3 sentences at 8-11 seconds in length
- Remove the worst noise with Audacity's noise removal tools
- Use RNNoise with VAD% set to 99
- Adjust speed via tempo changes to fix overly slow or fast speakers
- Fast and slow voices become highly problematic for producing emotional variations with Chatterbox
- Normalize at -8 dB
- Save the sample
- Use Kanade to resynthesize the sample, to remove reverb and other things RNNoise can't handle
- Normalize again, via SoX
- Kanade is run via a script, so it was easy enough to add an automated normalization step
- Upscale the sample to 44 kHz via Resemble Enhance
- Kanade produces 24 kHz samples
Most of the steps are optional and applied more or less as needed. For example, manual noise removal via Audacity is rarely needed, because the combination of RNNoise and Kanade generally wipe out all noise, even reverb.
This repository is a one-man show, but there are two areas in which contributions are welcome:
- Accent Classification
- In particular, help would be appreciated with classifying the various American accents by region
- There may be some Canadians mixed in, which should be clarified, if possible
- Some of the various UK accents may be incorrectly listed as English, when they're from other areas
- Sorry, but one does the best one can, with limited knowledge!
- In particular, help would be appreciated with classifying the various American accents by region
- Voice Suggestions
- LibriVox voice suggestions will always be welcome
- Voice samples from other source will also be considered
- DO NOT suggest voice samples that aren't in the public domain
- Accented voices are both desirable and useful
- Keep the following in mind, however:
- The maintainer only speaks English and won't accept suggestions for voices in other languages
- The noise removal steps require understanding what's been spoken
- The maintainer only speaks English and won't accept suggestions for voices in other languages
If you wish to clarify the accent of a voice or suggest a new voice, please file an issue.
When suggesting a new voice, please try to determine the accent, based on country of origin and region, because the maintainer has little direct experience with such things.
- The Book of Newts: Starwitch - An audiobook the maintainer put together for one of his novels, with emotional reading for characters. Voice-Zero provided all voices. Some voices were voice converted, to give one voice the accent of another.
Unless otherwise noted, all files in this repository are under the CC0 license. Currently, everything is based on samples from LibriVox.org, Archive.org and freesound.org.
For more specific details, please examine the README.md files in each subdirectory, which will also provide the names of the voice actors and links to the original sources.