FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
LLM-based architecture using flow matching and repetition-aware sampling for robust multilingual zero-shot voice synthesis with 150ms latency streaming support. Supports pronunciation control via Pinyin/CMU phonemes, instruction-based voice manipulation (emotion, speed, dialect), and integrates with vLLM and NVIDIA TRT-LLM for optimized inference, plus FastAPI server deployment.
19,991 stars. Actively maintained with 6 commits in the last 30 days.
Stars
19,991
Forks
2,270
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 11, 2026
Commits (30d)
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/FunAudioLLM/CosyVoice"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
travisvn/chatterbox-tts-api
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...
OpenMOSS/MOSS-TTS
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...
OpenMOSS/MOSS-TTSD
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis....
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model