davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

/ 100

Established

Leverages pyannote speaker embeddings stored in a Chroma vector database for persistent speaker identification across multiple audio files, eliminating manual speaker labeling. Supports diverse input sources (local files, YouTube, LibriVox, TED Talks) and chainable audio enhancers (DeepFilterNet, ResembleAI, MayaVoz) with output formatted for standard TTS/STT frameworks like LJSpeech and LibriSpeech. Computes linguistic metrics (syllables/words-per-minute) alongside gender classification and multilingual transcription via WhisperX for comprehensive dataset annotation.

257 stars and 241 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 15 / 25

How are scores calculated?

Stars

257

Forks

Language

Python

License

MIT

Related tools

ynop/audiomate

Python library for handling audio datasets.

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Explore Voice AI Tools

All categories Trending Voice AI directory Insights