Ijwi-ry-Ikirundi-AI/Kirundi_Dataset
🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi language. Building AI models for 12M+ Kirundi speakers through community collaboration. Includes ASR, TTS, and MT capabilities.
This project creates the first comprehensive, open-source collection of Kirundi speech and text data. It helps preserve and digitize the language for millions of speakers by providing transcribed audio and translated text. Anyone who speaks Kirundi and wants to contribute to building AI tools like voice assistants or translation apps for their language would use this.
Use this if you are a Kirundi speaker or linguist wanting to contribute Kirundi sentences, translations, or audio recordings to build modern language AI.
Not ideal if you are looking for a pre-built Kirundi AI model or an application ready for end-user use, as this project focuses on data collection for model development.
Stars
7
Forks
2
Language
Jupyter Notebook
License
—
Category
Last pushed
Feb 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Ijwi-ry-Ikirundi-AI/Kirundi_Dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
double22a/speech_dataset
The dataset of Speech Recognition
Jakobovski/free-spoken-digit-dataset
A free audio dataset of spoken digits. An audio version of MNIST.
lottev1991/Project-AIdol-Public-English-Dataset
Public female English corpus used for Project AI❤dol
Jahangirbd23/WenetSpeech-Yue
📑 Explore WenetSpeech-Yue, a comprehensive Cantonese speech corpus with rich annotations,...
Nexdata-AI/338-Hours-Spanish-Speech-Data-by-Mobile-Phone
Spanish Speech Dataset