candlewill/Speech-Corpus-Collection

A Collection of Speech Corpus for ASR and TTS

43
/ 100
Emerging

Curates publicly available speech datasets across multiple scales and languages, including large-scale corpora (LibriSpeech's 1000 hours), specialized domains (TED talks), and single-speaker databases optimized for voice synthesis. Provides centralized access to diverse ASR training data like VCTK and TEDLIUM alongside TTS corpora from CMU ARCTIC and Blizzard Challenge resources, with preprocessing notes for datasets requiring manual alignment like the World English Bible.

113 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

113

Forks

20

Language

License

MIT

Last pushed

Jun 19, 2017

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/candlewill/Speech-Corpus-Collection"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.