Speech Recognition Datasets ML Frameworks
Multilingual audio corpora for training speech recognition, synthesis, and conversational AI models. Does NOT include general audio processing tools, music datasets, or non-speech audio collections.
There are 8 speech recognition datasets frameworks tracked. The highest-rated is Ijwi-ry-Ikirundi-AI/Kirundi_Dataset at 36/100 with 7 stars.
Get all 8 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=speech-recognition-datasets&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
Ijwi-ry-Ikirundi-AI/Kirundi_Dataset
🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi... |
|
Emerging |
| 2 |
hstsethi/in-mob-prefix
Dataset, charts, models of 4 digit mobile number prefixes in India by state,... |
|
Emerging |
| 3 |
apple/ml-spatial-librispeech
A large synthetic dataset of spatial audio with multiple labels |
|
Experimental |
| 4 |
Jahangirbd23/WenetSpeech-Yue
📑 Explore WenetSpeech-Yue, a comprehensive Cantonese speech corpus with rich... |
|
Experimental |
| 5 |
Nexdata-AI/359-Hours-Indonesian-Speech-Data-by-Mobile-Phone_Reading
Indonesian Speech Dataset |
|
Experimental |
| 6 |
Nexdata-AI/207-Hours-Japanese-Speaking-English-Speech-Data-by-Mobile-Phone
Japanese Speaking English Speech Dataset |
|
Experimental |
| 7 |
Nexdata-AI/338-Hours-Spanish-Speech-Data-by-Mobile-Phone
Spanish Speech Dataset |
|
Experimental |
| 8 |
Nexdata-AI/98-Hours-Taiwan-Mandarin-Speech-Data-by-Mobile-Phone_Reading
Taiwan Speech Dataset |
|
Experimental |