Voice-Zero

This is a collection of open source compatible voice samples mostly from public domain and Creative Commons works, which are suitable for use with zero-shot text to speech engines.

The primary goal of this repository is to provide high quality voice samples that are ready to use, as-is, with zero-shot TTS engines like Chatterbox and Pocket TTS.

The secondary goal is to provide a very clear trail from the final voice samples all the way back to not only the voice actor, but the original source files, for the sake of giving credit where credit is due. That's not always possible, because some voice datasets seem to be anonymized, or have not been tracked very well, but whenever it can be done, that data will be provided.

A variety of tools were used to clean up the source data as much as possible, including the following:

Audacity
- The built-in noise remover can be handy
- Very rarely, this is also used to make slight speed changes
  - Via tempo changes
Kanade Tokenizer
- Amazing tool that operates in two modes:
  - Voice Conversion
  - Voice Resynthesis
    - This analyzes the voice, producing an ideal, noise-free version
    - That is then used to voice-convert the original sample, removing noise, which even works on reverb!
  - It's extremely fast, especially compared to the Chatterbox voice converter
SoX
- Involved in scripting the other tools, mostly for on the fly format conversion
Resemble Enhance
- AI noise remover and audio up-scaler
- This is used as the final step for each voice sample
- It does a good job removing the noise Chatterbox tends to introduce
RNNoise
- AI noise remover
- Works quite well with VAD% set to 99
  - The default of 50% is terribly ridiculous

Notes on Directories

The voices directory holds audio samples from LibriVox.org and Archive.org, which have been noise reduced and trimmed to between seven and roughly twelve seconds, a length that works well for zero-shot TTS engines. This directory will always hold CC0-licensed samples.

The voices-emotion directory holds synthetic audio samples produced by using samples from the voices directory with Chatterbox, to produce emotional variations.

If there are ever samples under other licenses, they will be placed in other directories, to keep the licensing issues clear and easy to work with.

How Voice Samples Are Cleaned

For the curious, here's the current process used for cleaning up new voice samples, before they're fed into Chatterbox:

Import into Audacity
Trim sample to between 1 and 3 sentences at 8-11 seconds in length
Remove the worst noise with Audacity's noise removal tools
Use RNNoise with VAD% set to 99
Adjust speed via tempo changes to fix overly slow or fast speakers
- Fast and slow voices become highly problematic for producing emotional variations with Chatterbox
Normalize at -8 dB
Save the sample
Use Kanade to resynthesize the sample, to remove reverb and other things RNNoise can't handle
Normalize again, via SoX
- Kanade is run via a script, so it was easy enough to add an automated normalization step
Upscale the sample to 44 kHz via Resemble Enhance
- Kanade produces 24 kHz samples

Most of the steps are optional and applied more or less as needed. For example, manual noise removal via Audacity is rarely needed, because the combination of RNNoise and Kanade generally wipe out all noise, even reverb.

Contributing

This repository is a one-man show, but there are two areas in which contributions are welcome:

Accent Classification
- In particular, help would be appreciated with classifying the various American accents by region
  - There may be some Canadians mixed in, which should be clarified, if possible
- Some of the various UK accents may be incorrectly listed as English, when they're from other areas
  - Sorry, but one does the best one can, with limited knowledge!
Voice Suggestions
- LibriVox voice suggestions will always be welcome
- Voice samples from other source will also be considered
- DO NOT suggest voice samples that aren't in the public domain
- Accented voices are both desirable and useful
- Keep the following in mind, however:
  - The maintainer only speaks English and won't accept suggestions for voices in other languages
    - The noise removal steps require understanding what's been spoken

If you wish to clarify the accent of a voice or suggest a new voice, please file an issue.

When suggesting a new voice, please try to determine the accent, based on country of origin and region, because the maintainer has little direct experience with such things.

Example Uses

The Book of Newts: Starwitch - An audiobook the maintainer put together for one of his novels, with emotional reading for characters. Voice-Zero provided all voices. Some voices were voice converted, to give one voice the accent of another.

License

Unless otherwise noted, all files in this repository are under the CC0 license. Currently, everything is based on samples from LibriVox.org, Archive.org and freesound.org.

For more specific details, please examine the README.md files in each subdirectory, which will also provide the names of the voice actors and links to the original sources.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
voices-emotion		voices-emotion
voices		voices
LICENSE.md		LICENSE.md
README.md		README.md
icon.svg		icon.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Zero

Notes on Directories

How Voice Samples Are Cleaned

Contributing

Example Uses

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Voice-Zero

Notes on Directories

How Voice Samples Are Cleaned

Contributing

Example Uses

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages