rioharper/VocalForge

Your one-stop solution for voice dataset creation

/ 100

Emerging

Combines Whisper transcription, PyAnnote speaker diarization, and CTC segmentation to automatically process raw audio into aligned speech datasets with minimal manual curation. The toolkit handles speaker isolation, voice activity detection, noise filtering, and text normalization across multiple audio sources, then exports in LJSpeech format. Includes VCAuditor, a browser-based verification interface for reviewing waveforms, correcting alignments, and filtering low-confidence segments before final dataset export.

130 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

130

Forks

Language

Python

License

MIT

Higher-rated alternatives

voicegain/platform

Voicegain Enterprise Speech-to-Text Platform (API, Portal, etc.)

davidamacey/OpenTranscribe

Self-hosted AI-powered transcription platform with speaker diarization, search, and...

aws-samples/amazon-transcribe-live-call-analytics

Amazon Transcribe Live Call Analytics (LCA) Sample Solution

SamirPaulb/real-time-voice-translator

A desktop application that uses AI to translate voice between languages in real time, while...

jim-schwoebel/voicebook

🗣️ A book and repo to get you started programming voice computing applications in Python (10...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights