Speech Recognition Datasets ML Frameworks

Multilingual audio corpora for training speech recognition, synthesis, and conversational AI models. Does NOT include general audio processing tools, music datasets, or non-speech audio collections.

There are 8 speech recognition datasets frameworks tracked. The highest-rated is Ijwi-ry-Ikirundi-AI/Kirundi_Dataset at 36/100 with 7 stars.

Get all 8 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=speech-recognition-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 Ijwi-ry-Ikirundi-AI/Kirundi_Dataset

🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi...

36
Emerging
2 hstsethi/in-mob-prefix

Dataset, charts, models of 4 digit mobile number prefixes in India by state,...

31
Emerging
3 apple/ml-spatial-librispeech

A large synthetic dataset of spatial audio with multiple labels

29
Experimental
4 Jahangirbd23/WenetSpeech-Yue

📑 Explore WenetSpeech-Yue, a comprehensive Cantonese speech corpus with rich...

22
Experimental
5 Nexdata-AI/359-Hours-Indonesian-Speech-Data-by-Mobile-Phone_Reading

Indonesian Speech Dataset

18
Experimental
6 Nexdata-AI/207-Hours-Japanese-Speaking-English-Speech-Data-by-Mobile-Phone

Japanese Speaking English Speech Dataset

16
Experimental
7 Nexdata-AI/338-Hours-Spanish-Speech-Data-by-Mobile-Phone

Spanish Speech Dataset

14
Experimental
8 Nexdata-AI/98-Hours-Taiwan-Mandarin-Speech-Data-by-Mobile-Phone_Reading

Taiwan Speech Dataset

14
Experimental