FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

64
/ 100
Established

LLM-based architecture using flow matching and repetition-aware sampling for robust multilingual zero-shot voice synthesis with 150ms latency streaming support. Supports pronunciation control via Pinyin/CMU phonemes, instruction-based voice manipulation (emotion, speed, dialect), and integrates with vLLM and NVIDIA TRT-LLM for optimized inference, plus FastAPI server deployment.

19,991 stars. Actively maintained with 6 commits in the last 30 days.

No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

19,991

Forks

2,270

Language

Python

License

Apache-2.0

Last pushed

Feb 11, 2026

Commits (30d)

6

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/FunAudioLLM/CosyVoice"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.