IS2AI/Kazakh_TTS
An expanded version of the previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In KazakhTTS2, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage has been diversified.
Built on ESPnet, this recipe implements end-to-end Kazakh TTS using Tacotron2, Transformer, or FastSpeech acoustic models paired with ParallelWaveGAN vocoders for waveform generation. The framework provides speaker-specific pretrained checkpoints for all five speakers and supports character-level text encoding for Kazakh phonetics. Training leverages Kaldi utilities for data processing and integrates with ESPnet's modular pipeline, allowing staged training from feature extraction through model optimization.
147 stars. No commits in the last 6 months.
Stars
147
Forks
26
Language
Shell
License
CC-BY-4.0
Category
Last pushed
Aug 01, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/IS2AI/Kazakh_TTS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hetpandya/youtube_tts_data_generator
A python library to generate speech dataset from Youtube videos
Hecate2/sukasuka-vocal-dataset-builder
すかすかアニメボカロデータセット。1st anime vocal dataset. Extract audio (vocal) files from video based on .ass...
youmebangbang/TTS-dataset-tools
Automatically generates TTS dataset using audio and associated text. Make cuts under a custom...
taresh18/TTSizer
🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨
keonlee9420/DailyTalk
Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023