Edresson/YourTTS
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Based on VITS architecture, YourTTS adds speaker encoder integration and multilingual training to enable zero-shot adaptation with minimal speaker data (<1 minute of speech). The model decouples speaker identity through pre-computed d-vector embeddings and speaker consistency loss, allowing transfer across languages and low-resource scenarios. Implemented within the Coqui TTS ecosystem, it provides command-line interfaces for both TTS and voice conversion tasks via speaker/reference audio conditioning.
1,052 stars. No commits in the last 6 months.
Stars
1,052
Forks
97
Language
Jupyter Notebook
License
—
Category
Last pushed
Nov 04, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Edresson/YourTTS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
index-tts/index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
lucasnewman/f5-tts-mlx
Implementation of F5-TTS in MLX
stepfun-ai/Step-Audio-EditX
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing...
unilight/seq2seq-vc
A sequence-to-sequence voice conversion toolkit.
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System