Zero-Shot Voice Synthesis Voice AI Tools

Tools for synthesizing speech with zero-shot or few-shot learning, enabling speaker cloning, emotion control, style transfer, and voice conversion without extensive training data. Does NOT include general text-to-speech engines, ASR systems, or non-zero-shot voice synthesis approaches.

There are 43 zero-shot voice synthesis tools tracked. 3 score above 50 (established tier). The highest-rated is index-tts/index-tts at 63/100 with 19,454 stars. 2 of the top 10 are actively maintained.

Get all 43 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=voice-ai&subcategory=zero-shot-voice-synthesis&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 index-tts/index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

63
Established
2 lucasnewman/f5-tts-mlx

Implementation of F5-TTS in MLX

55
Established
3 stepfun-ai/Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model...

50
Established
4 unilight/seq2seq-vc

A sequence-to-sequence voice conversion toolkit.

46
Emerging
5 JosefAlbers/e2tts-mlx

Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX

41
Emerging
6 FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

39
Emerging
7 RaduBolbo/F5-TTS-Emotional-CFG

Zero-shot voice cloning text-to-speech (TTS) with explicit emotion class...

39
Emerging
8 ubisoft/ubisoft-laforge-daft-exprt

Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

38
Emerging
9 Kyubyong/cross_vc

Cross-lingual Voice Conversion

38
Emerging
10 Edresson/YourTTS

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion...

38
Emerging
11 lucasnewman/f5-tts-swift

Implementation of F5-TTS in Swift using MLX

37
Emerging
12 hi-paris/Prosody-Control-French-TTS

An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control

37
Emerging
13 keonlee9420/Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based...

36
Emerging
14 uetuluk/xcodec2-infer-lib

CPU support for xcodec2

35
Emerging
15 Emotional-Text-to-Speech/hmm-for-emo-tts

:computer: A repository with comprehensive instructions for using the...

34
Emerging
16 WangHelin1997/SSR-Speech

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

34
Emerging
17 keonlee9420/Robust_Fine_Grained_Prosody_Control

PyTorch Implementation of Robust and fine-grained prosody control of...

33
Emerging
18 adelacvg/ttts

Train the next generation of TTS systems.

33
Emerging
19 lucasnewman/descript-mlx

Implementation of the Descript Audio Codec in MLX

33
Emerging
20 aiola-lab/drax

Drax: Speech Recognition with Discrete Flow Matching

32
Emerging
21 hcy71o/SC-CNN

SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker...

31
Emerging
22 WelkinYang/Learn2Sing2.0

Diffusion and Mutual Information-Based Target Speaker SVS by Learning from...

28
Experimental
23 ddlBoJack/MT4SSL

[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL:...

26
Experimental
24 NN-Project-2/Emotion-TTS-Emebddings

This project explores zero-shot emotional speech synthesis using EMOD, a...

25
Experimental
25 ictnlp/ComSpeech

Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct...

24
Experimental
26 rishikksh20/Zero-Shot-TTS

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based...

24
Experimental
27 adelacvg/detail_tts

All generative model in one for better TTS model

23
Experimental
28 lordzuko/cross-text-PT

Improving the Appropriateness in Cross-Text Prosody Transfer using Human Supervision

23
Experimental
29 CMsmartvoice/Unet-TTS

One-shot TTS with Improved Unseen Speaker and Style Transfer

23
Experimental
30 xuan3986/UDDETTS

The first LLM that unifies discrete and dimensional emotions for...

23
Experimental
31 zhenye234/FlashSpeech

ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis

22
Experimental
32 jishengpeng/ControlSpeech

[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker...

22
Experimental
33 fmiotello/fastVC

A simple voice conversion tool

22
Experimental
34 NassimaOULDOUALI/Prosody-Control-French-TTS

An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control

21
Experimental
35 WelkinYang/EMPHASIS-pytorch

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

21
Experimental
36 ORI-Muchim/Grad-TTS

'Grad-TTS' with Multilingual Cleaners

21
Experimental
37 Rumeysakeskin/Turkish-Text-to-Speech

Speech synthesis (TTS) in low-resource languages by training from scratch...

20
Experimental
38 jzmzhong/Automatic-Prosody-Annotator-with-SSWP-CLAP

An automatic prosodic boundary annotation tool for Text-to-Speech Synthesis (TTS).

20
Experimental
39 MotivationalSpeechSynthesis/motivational-speech-synthesis

Artistic research deconstructing the performative excess of motivational...

17
Experimental
40 the-bird-F/Expressive-Vectors

[ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal...

17
Experimental
41 adelacvg/DPTTS

An AR+AR TTS attempt.

16
Experimental
42 Wonbin-Jung/e3-vits

Official GitHub page of E3-VITS

14
Experimental
43 wenhuahuo/Cross-Device-Acoustic-Communication-Python-Implementation

Digital acoustic communication tools using QFSK and Convolutional Encode. 跨设备声学通信。

14
Experimental

Comparisons in this category