OpenMOSS/MOSS-TTSD
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.
Built on transformer-based architecture with audio tokenization via XY-Tokenizer, the model uses a continuation-based workflow where speaker reference audio and dialogue scripts enable seamless multi-speaker synthesis over extended contexts. Optimized for SGLang inference engine acceleration (up to 16x speedup), it supports streaming generation, fine-tuning via LoRA and full-parameter training, and integrates with Hugging Face model hub and spaces for easy deployment across 20 languages.
1,202 stars. Actively maintained with 4 commits in the last 30 days.
Stars
1,202
Forks
116
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 06, 2026
Commits (30d)
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/OpenMOSS/MOSS-TTSD"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
travisvn/chatterbox-tts-api
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment...
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
OpenMOSS/MOSS-TTS
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...
sfortis/openai_tts
Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible...