carlfm01/my-speech-datasets

My public domain speech index

/ 100

Experimental

This project provides pre-processed collections of spoken audio and corresponding text transcripts, specifically for the Spanish language. It helps researchers and developers who are building or improving speech recognition systems by offering ready-to-use, public domain datasets. You get audio files and their accurate text versions, ideal for training machine learning models.

No commits in the last 6 months.

Use this if you need high-quality, aligned Spanish speech and text data to train or evaluate your automatic speech recognition (ASR) models.

Not ideal if you require speech datasets in languages other than Spanish, or if you need data for tasks like speaker identification rather than speech-to-text transcription.

speech-recognition natural-language-processing machine-learning-training-data linguistics audio-transcription

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

—

Higher-rated alternatives

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

Explore Voice AI Tools

All categories Trending Voice AI directory Insights