Zero-Shot Voice Synthesis Voice AI Tools
Tools for synthesizing speech with zero-shot or few-shot learning, enabling speaker cloning, emotion control, style transfer, and voice conversion without extensive training data. Does NOT include general text-to-speech engines, ASR systems, or non-zero-shot voice synthesis approaches.
There are 43 zero-shot voice synthesis tools tracked. 3 score above 50 (established tier). The highest-rated is index-tts/index-tts at 63/100 with 19,454 stars. 2 of the top 10 are actively maintained.
Get all 43 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=voice-ai&subcategory=zero-shot-voice-synthesis&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
index-tts/index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System |
|
Established |
| 2 |
lucasnewman/f5-tts-mlx
Implementation of F5-TTS in MLX |
|
Established |
| 3 |
stepfun-ai/Step-Audio-EditX
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model... |
|
Established |
| 4 |
unilight/seq2seq-vc
A sequence-to-sequence voice conversion toolkit. |
|
Emerging |
| 5 |
JosefAlbers/e2tts-mlx
Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX |
|
Emerging |
| 6 |
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System |
|
Emerging |
| 7 |
RaduBolbo/F5-TTS-Emotional-CFG
Zero-shot voice cloning text-to-speech (TTS) with explicit emotion class... |
|
Emerging |
| 8 |
ubisoft/ubisoft-laforge-daft-exprt
Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis |
|
Emerging |
| 9 |
Kyubyong/cross_vc
Cross-lingual Voice Conversion |
|
Emerging |
| 10 |
Edresson/YourTTS
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion... |
|
Emerging |
| 11 |
lucasnewman/f5-tts-swift
Implementation of F5-TTS in Swift using MLX |
|
Emerging |
| 12 |
hi-paris/Prosody-Control-French-TTS
An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control |
|
Emerging |
| 13 |
keonlee9420/Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based... |
|
Emerging |
| 14 |
uetuluk/xcodec2-infer-lib
CPU support for xcodec2 |
|
Emerging |
| 15 |
Emotional-Text-to-Speech/hmm-for-emo-tts
:computer: A repository with comprehensive instructions for using the... |
|
Emerging |
| 16 |
WangHelin1997/SSR-Speech
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis |
|
Emerging |
| 17 |
keonlee9420/Robust_Fine_Grained_Prosody_Control
PyTorch Implementation of Robust and fine-grained prosody control of... |
|
Emerging |
| 18 |
adelacvg/ttts
Train the next generation of TTS systems. |
|
Emerging |
| 19 |
lucasnewman/descript-mlx
Implementation of the Descript Audio Codec in MLX |
|
Emerging |
| 20 |
aiola-lab/drax
Drax: Speech Recognition with Discrete Flow Matching |
|
Emerging |
| 21 |
hcy71o/SC-CNN
SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker... |
|
Emerging |
| 22 |
WelkinYang/Learn2Sing2.0
Diffusion and Mutual Information-Based Target Speaker SVS by Learning from... |
|
Experimental |
| 23 |
ddlBoJack/MT4SSL
[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL:... |
|
Experimental |
| 24 |
NN-Project-2/Emotion-TTS-Emebddings
This project explores zero-shot emotional speech synthesis using EMOD, a... |
|
Experimental |
| 25 |
ictnlp/ComSpeech
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct... |
|
Experimental |
| 26 |
rishikksh20/Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based... |
|
Experimental |
| 27 |
adelacvg/detail_tts
All generative model in one for better TTS model |
|
Experimental |
| 28 |
lordzuko/cross-text-PT
Improving the Appropriateness in Cross-Text Prosody Transfer using Human Supervision |
|
Experimental |
| 29 |
CMsmartvoice/Unet-TTS
One-shot TTS with Improved Unseen Speaker and Style Transfer |
|
Experimental |
| 30 |
xuan3986/UDDETTS
The first LLM that unifies discrete and dimensional emotions for... |
|
Experimental |
| 31 |
zhenye234/FlashSpeech
ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis |
|
Experimental |
| 32 |
jishengpeng/ControlSpeech
[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker... |
|
Experimental |
| 33 |
fmiotello/fastVC
A simple voice conversion tool |
|
Experimental |
| 34 |
NassimaOULDOUALI/Prosody-Control-French-TTS
An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control |
|
Experimental |
| 35 |
WelkinYang/EMPHASIS-pytorch
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System |
|
Experimental |
| 36 |
ORI-Muchim/Grad-TTS
'Grad-TTS' with Multilingual Cleaners |
|
Experimental |
| 37 |
Rumeysakeskin/Turkish-Text-to-Speech
Speech synthesis (TTS) in low-resource languages by training from scratch... |
|
Experimental |
| 38 |
jzmzhong/Automatic-Prosody-Annotator-with-SSWP-CLAP
An automatic prosodic boundary annotation tool for Text-to-Speech Synthesis (TTS). |
|
Experimental |
| 39 |
MotivationalSpeechSynthesis/motivational-speech-synthesis
Artistic research deconstructing the performative excess of motivational... |
|
Experimental |
| 40 |
the-bird-F/Expressive-Vectors
[ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal... |
|
Experimental |
| 41 |
adelacvg/DPTTS
An AR+AR TTS attempt. |
|
Experimental |
| 42 |
Wonbin-Jung/e3-vits
Official GitHub page of E3-VITS |
|
Experimental |
| 43 |
wenhuahuo/Cross-Device-Acoustic-Communication-Python-Implementation
Digital acoustic communication tools using QFSK and Convolutional Encode. 跨设备声学通信。 |
|
Experimental |