OpenMOSS/MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.

57
/ 100
Established

Built on transformer-based architecture with audio tokenization via XY-Tokenizer, the model uses a continuation-based workflow where speaker reference audio and dialogue scripts enable seamless multi-speaker synthesis over extended contexts. Optimized for SGLang inference engine acceleration (up to 16x speedup), it supports streaming generation, fine-tuning via LoRA and full-parameter training, and integrates with Hugging Face model hub and spaces for easy deployment across 20 languages.

1,202 stars. Actively maintained with 4 commits in the last 30 days.

No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 19 / 25

How are scores calculated?

Stars

1,202

Forks

116

Language

Python

License

Apache-2.0

Last pushed

Mar 06, 2026

Commits (30d)

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/OpenMOSS/MOSS-TTSD"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.