TTS Dataset Creation Voice AI Tools
Tools and workflows for preparing, recording, processing, and organizing audio datasets specifically for training text-to-speech models. Does NOT include pre-built TTS datasets, TTS model training frameworks, or general speech datasets for ASR/voice cloning.
There are 33 tts dataset creation tools tracked. The highest-rated is hetpandya/youtube_tts_data_generator at 46/100 with 37 stars and 55 monthly downloads.
Get all 33 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=voice-ai&subcategory=tts-dataset-creation&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
hetpandya/youtube_tts_data_generator
A python library to generate speech dataset from Youtube videos |
|
Emerging |
| 2 |
IS2AI/Kazakh_TTS
An expanded version of the previously released Kazakh text-to-speech... |
|
Emerging |
| 3 |
taresh18/TTSizer
🎙️ Automatically transcribe audio/video into high-quality, speaker-specific... |
|
Emerging |
| 4 |
Hecate2/sukasuka-vocal-dataset-builder
すかすかアニメボカロデータセット。1st anime vocal dataset. Extract audio (vocal) files from... |
|
Emerging |
| 5 |
youmebangbang/TTS-dataset-tools
Automatically generates TTS dataset using audio and associated text. Make... |
|
Emerging |
| 6 |
stefantaubert/pronunciation-dictionary-utils
Utils to modify pronunciation dictionaries. |
|
Emerging |
| 7 |
GuangChen2333/FindUrVoicesPJSK
《世界计划 : 缤纷舞台》单角色语音数据集一键获取小工具 | 无需手动打标 | wav无压缩 | A simple tool for obtaining... |
|
Emerging |
| 8 |
keonlee9420/DailyTalk
Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational... |
|
Emerging |
| 9 |
souvikg544/TTS_Data_Maker
Text to speech is an emerging zone of AI. This repository helps to create a... |
|
Emerging |
| 10 |
gokhaneraslan/tts-dataset-generator
With this tool you can create custom TTS dataset from video or audio. |
|
Emerging |
| 11 |
revsic/speechset
Numpy-librosa implementation of Speech dataset pipeline |
|
Emerging |
| 12 |
FS-17/SpeechDataBuilder
Browser-based open-source tool for creating high-quality TTS/STT datasets.... |
|
Experimental |
| 13 |
ShawnPi233/SynParaSpeech
Official Repository of Paper: "SynParaSpeech: Automated Synthesis of... |
|
Experimental |
| 14 |
danklabs/tts_dataset_maker
A gui to help make a text to speech dataset. |
|
Experimental |
| 15 |
hollygrimm/voice-dataset-creation
Tools to create your own voice dataset for TTS training |
|
Experimental |
| 16 |
MiniXC/phones
A collection of utilities for handling IPA phones. |
|
Experimental |
| 17 |
soukhova/TTS2016R
A data-package including the 2016 TTS origins, TTS destinations, number of... |
|
Experimental |
| 18 |
iuliiakr/TTS-Project-Framework
Architecture framework for building production-grade text-to-speech systems,... |
|
Experimental |
| 19 |
IS2AI/TurkicTTS
A multilingual text-to-speech synthesis system for ten lower-resourced... |
|
Experimental |
| 20 |
wkdrns202/TTSDataSetCleanser
TTSDataSetCleanser. This program can do the labeling work for the Raw Speech... |
|
Experimental |
| 21 |
babua/TTSDatasetRecorder
A simple app for recording speech datasets. |
|
Experimental |
| 22 |
nonverbalspeech38k/nonverspeech38k
The official repository for the paper “NonVerbalSpeech-38K: A Scalable... |
|
Experimental |
| 23 |
pilot7747/VoxDIY
This repository provides data and code for "Vox Populi, Vox DIY: Benchmark... |
|
Experimental |
| 24 |
hecko-yes/tts-dataset-prompts
Finally, some decent sample sentences |
|
Experimental |
| 25 |
Lostenergydrink/styletts2-dataset-toolkit
Complete Windows-optimized workflow for voice cloning with StyleTTS2.... |
|
Experimental |
| 26 |
ItsJamin/another-tts
A program to easily create datasets for training own tts models. |
|
Experimental |
| 27 |
egorsmkv/qirimtatar-tts-datasets
Open Source Crimean Tatar Text-to-Speech datasets |
|
Experimental |
| 28 |
quochuy242/VNAVC
Data Pipeline for Text to Speech Project |
|
Experimental |
| 29 |
deeplearningcafe/animespeechdataset
Dataset Generation for Language Model Training and Text-to-Speech Synthesis... |
|
Experimental |
| 30 |
kdorichev/text2speech
Text-To-Speech Dataset Preparation and Architecture |
|
Experimental |
| 31 |
willwade/TTS-Dataset
A workflow to create a dataset of all TTS voices/languages available on... |
|
Experimental |
| 32 |
hclivess/speech-splitter
Turn any audio file into a TTS training dataset |
|
Experimental |
| 33 |
clayton14/tts_dataset_recorder
All you have to do is ramble to make a dataset for your voice |
|
Experimental |