OpenMOSS/MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.

/ 100

Established

Built on transformer-based architecture with audio tokenization via XY-Tokenizer, the model uses a continuation-based workflow where speaker reference audio and dialogue scripts enable seamless multi-speaker synthesis over extended contexts. Optimized for SGLang inference engine acceleration (up to 16x speedup), it supports streaming generation, fine-tuning via LoRA and full-parameter training, and integrates with Hugging Face model hub and spaces for easy deployment across 20 languages.

1,202 stars. Actively maintained with 4 commits in the last 30 days.

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 19 / 25

How are scores calculated?

Stars

1,202

Forks

116

Language

Python

License

Apache-2.0

Compare

MOSS-TTSD and MOSS-TTS MOSS-TTSD and MOSS-Speech

Related tools

travisvn/chatterbox-tts-api

Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...

FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment...

fishaudio/Bert-VITS2

vits2 backbone with multilingual-bert

OpenMOSS/MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...

sfortis/openai_tts

Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights