TTS Dataset Creation Voice AI Tools

Tools and workflows for preparing, recording, processing, and organizing audio datasets specifically for training text-to-speech models. Does NOT include pre-built TTS datasets, TTS model training frameworks, or general speech datasets for ASR/voice cloning.

There are 33 tts dataset creation tools tracked. The highest-rated is hetpandya/youtube_tts_data_generator at 46/100 with 37 stars and 55 monthly downloads.

Get all 33 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=voice-ai&subcategory=tts-dataset-creation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 hetpandya/youtube_tts_data_generator

A python library to generate speech dataset from Youtube videos

46
Emerging
2 IS2AI/Kazakh_TTS

An expanded version of the previously released Kazakh text-to-speech...

40
Emerging
3 taresh18/TTSizer

🎙️ Automatically transcribe audio/video into high-quality, speaker-specific...

37
Emerging
4 Hecate2/sukasuka-vocal-dataset-builder

すかすかアニメボカロデータセット。1st anime vocal dataset. Extract audio (vocal) files from...

36
Emerging
5 youmebangbang/TTS-dataset-tools

Automatically generates TTS dataset using audio and associated text. Make...

36
Emerging
6 stefantaubert/pronunciation-dictionary-utils

Utils to modify pronunciation dictionaries.

36
Emerging
7 GuangChen2333/FindUrVoicesPJSK

《世界计划 : 缤纷舞台》单角色语音数据集一键获取小工具 | 无需手动打标 | wav无压缩 | A simple tool for obtaining...

33
Emerging
8 keonlee9420/DailyTalk

Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational...

32
Emerging
9 souvikg544/TTS_Data_Maker

Text to speech is an emerging zone of AI. This repository helps to create a...

32
Emerging
10 gokhaneraslan/tts-dataset-generator

With this tool you can create custom TTS dataset from video or audio.

31
Emerging
11 revsic/speechset

Numpy-librosa implementation of Speech dataset pipeline

30
Emerging
12 FS-17/SpeechDataBuilder

Browser-based open-source tool for creating high-quality TTS/STT datasets....

28
Experimental
13 ShawnPi233/SynParaSpeech

Official Repository of Paper: "SynParaSpeech: Automated Synthesis of...

27
Experimental
14 danklabs/tts_dataset_maker

A gui to help make a text to speech dataset.

24
Experimental
15 hollygrimm/voice-dataset-creation

Tools to create your own voice dataset for TTS training

23
Experimental
16 MiniXC/phones

A collection of utilities for handling IPA phones.

23
Experimental
17 soukhova/TTS2016R

A data-package including the 2016 TTS origins, TTS destinations, number of...

23
Experimental
18 iuliiakr/TTS-Project-Framework

Architecture framework for building production-grade text-to-speech systems,...

22
Experimental
19 IS2AI/TurkicTTS

A multilingual text-to-speech synthesis system for ten lower-resourced...

22
Experimental
20 wkdrns202/TTSDataSetCleanser

TTSDataSetCleanser. This program can do the labeling work for the Raw Speech...

22
Experimental
21 babua/TTSDatasetRecorder

A simple app for recording speech datasets.

21
Experimental
22 nonverbalspeech38k/nonverspeech38k

The official repository for the paper “NonVerbalSpeech-38K: A Scalable...

20
Experimental
23 pilot7747/VoxDIY

This repository provides data and code for "Vox Populi, Vox DIY: Benchmark...

20
Experimental
24 hecko-yes/tts-dataset-prompts

Finally, some decent sample sentences

19
Experimental
25 Lostenergydrink/styletts2-dataset-toolkit

Complete Windows-optimized workflow for voice cloning with StyleTTS2....

17
Experimental
26 ItsJamin/another-tts

A program to easily create datasets for training own tts models.

16
Experimental
27 egorsmkv/qirimtatar-tts-datasets

Open Source Crimean Tatar Text-to-Speech datasets

14
Experimental
28 quochuy242/VNAVC

Data Pipeline for Text to Speech Project

13
Experimental
29 deeplearningcafe/animespeechdataset

Dataset Generation for Language Model Training and Text-to-Speech Synthesis...

13
Experimental
30 kdorichev/text2speech

Text-To-Speech Dataset Preparation and Architecture

13
Experimental
31 willwade/TTS-Dataset

A workflow to create a dataset of all TTS voices/languages available on...

12
Experimental
32 hclivess/speech-splitter

Turn any audio file into a TTS training dataset

12
Experimental
33 clayton14/tts_dataset_recorder

All you have to do is ramble to make a dataset for your voice

10
Experimental