CorentinJ/Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Implements the three-stage SV2TTS framework combining a GE2E speaker encoder with Tacotron synthesis and WaveRNN vocoder to enable real-time speech generation from speaker embeddings. Provides both GUI and CLI interfaces supporting CPU/GPU inference, with pretrained models automatically downloaded from Hugging Face. While noted as an older reference implementation, it remains a functional open-source alternative to contemporary commercial voice cloning services.
59,518 stars. Actively maintained with 1 commit in the last 30 days.
Stars
59,518
Forks
9,422
Language
Python
License
—
Category
Last pushed
Mar 09, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/CorentinJ/Real-Time-Voice-Cloning"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
pnnbao97/VieNeu-TTS
Vietnamese TTS with instant voice cloning • On-device • Real-time CPU inference • 24kHz audio...
r9y9/nnmnkwii
Library to build speech synthesis systems designed for easy and fast prototyping.
babysor/MockingBird
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
Softcatala/open-dubbing
Open dubbing is an AI dubbing system which uses machine learning models to automatically...
coqui-ai/STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never...