davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Leverages pyannote speaker embeddings stored in a Chroma vector database for persistent speaker identification across multiple audio files, eliminating manual speaker labeling. Supports diverse input sources (local files, YouTube, LibriVox, TED Talks) and chainable audio enhancers (DeepFilterNet, ResembleAI, MayaVoz) with output formatted for standard TTS/STT frameworks like LJSpeech and LibriSpeech. Computes linguistic metrics (syllables/words-per-minute) alongside gender classification and multilingual transcription via WhisperX for comprehensive dataset annotation.
257 stars and 241 monthly downloads. No commits in the last 6 months. Available on PyPI.
Stars
257
Forks
25
Language
Python
License
MIT
Category
Last pushed
Jun 10, 2024
Monthly downloads
241
Commits (30d)
0
Dependencies
12
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/davidmartinrius/speech-dataset-generator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
ynop/audiomate
Python library for handling audio datasets.
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos
coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies