FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

/ 100

Established

LLM-based architecture using flow matching and repetition-aware sampling for robust multilingual zero-shot voice synthesis with 150ms latency streaming support. Supports pronunciation control via Pinyin/CMU phonemes, instruction-based voice manipulation (emotion, speed, dialect), and integrates with vLLM and NVIDIA TRT-LLM for optimized inference, plus FastAPI server deployment.

19,991 stars. Actively maintained with 6 commits in the last 30 days.

No Package No Dependents

Maintenance 17 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

19,991

Forks

2,270

Language

Python

License

Apache-2.0

Featured in

Things AI Won't Tell You About Building a Voice App

Related tools

fishaudio/Bert-VITS2

vits2 backbone with multilingual-bert

travisvn/chatterbox-tts-api

Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...

OpenMOSS/MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...

OpenMOSS/MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis....

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

Explore Voice AI Tools

All categories Trending Voice AI directory Insights