davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

55
/ 100
Established

Leverages pyannote speaker embeddings stored in a Chroma vector database for persistent speaker identification across multiple audio files, eliminating manual speaker labeling. Supports diverse input sources (local files, YouTube, LibriVox, TED Talks) and chainable audio enhancers (DeepFilterNet, ResembleAI, MayaVoz) with output formatted for standard TTS/STT frameworks like LJSpeech and LibriSpeech. Computes linguistic metrics (syllables/words-per-minute) alongside gender classification and multilingual transcription via WhisperX for comprehensive dataset annotation.

257 stars and 241 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 0 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 15 / 25

How are scores calculated?

Stars

257

Forks

25

Language

Python

License

MIT

Last pushed

Jun 10, 2024

Monthly downloads

241

Commits (30d)

0

Dependencies

12

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/davidmartinrius/speech-dataset-generator"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.