Ijwi-ry-Ikirundi-AI/Kirundi_Dataset

🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi language. Building AI models for 12M+ Kirundi speakers through community collaboration. Includes ASR, TTS, and MT capabilities.

/ 100

Emerging

This project creates the first comprehensive, open-source collection of Kirundi speech and text data. It helps preserve and digitize the language for millions of speakers by providing transcribed audio and translated text. Anyone who speaks Kirundi and wants to contribute to building AI tools like voice assistants or translation apps for their language would use this.

Use this if you are a Kirundi speaker or linguist wanting to contribute Kirundi sentences, translations, or audio recordings to build modern language AI.

Not ideal if you are looking for a pre-built Kirundi AI model or an application ready for end-user use, as this project focuses on data collection for model development.

Kirundi-language-preservation speech-recognition text-translation language-digitization low-resource-languages

No Package No Dependents

Maintenance 10 / 25

Adoption 4 / 25

Maturity 13 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

double22a/speech_dataset

The dataset of Speech Recognition

Jakobovski/free-spoken-digit-dataset

A free audio dataset of spoken digits. An audio version of MNIST.

lottev1991/Project-AIdol-Public-English-Dataset

Public female English corpus used for Project AI❤dol

Jahangirbd23/WenetSpeech-Yue

📑 Explore WenetSpeech-Yue, a comprehensive Cantonese speech corpus with rich annotations,...

Nexdata-AI/338-Hours-Spanish-Speech-Data-by-Mobile-Phone

Spanish Speech Dataset

Explore Voice AI Tools

All categories Trending Voice AI directory Insights