MOSS-TTSD and MOSS-Speech

MOSS-TTSD handles the output side (text-to-speech synthesis) while MOSS-Speech handles the input side (speech-to-speech processing), making them complementary components of an end-to-end voice conversation pipeline.

MOSS-TTSD
57
Established
MOSS-Speech
44
Emerging
Maintenance 13/25
Adoption 10/25
Maturity 15/25
Community 19/25
Maintenance 10/25
Adoption 10/25
Maturity 15/25
Community 9/25
Stars: 1,202
Forks: 116
Downloads:
Commits (30d): 4
Language: Python
License: Apache-2.0
Stars: 127
Forks: 7
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No Package No Dependents
No Package No Dependents

About MOSS-TTSD

OpenMOSS/MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.

Built on transformer-based architecture with audio tokenization via XY-Tokenizer, the model uses a continuation-based workflow where speaker reference audio and dialogue scripts enable seamless multi-speaker synthesis over extended contexts. Optimized for SGLang inference engine acceleration (up to 16x speedup), it supports streaming generation, fine-tuning via LoRA and full-parameter training, and integrates with Hugging Face model hub and spaces for easy deployment across 20 languages.

About MOSS-Speech

OpenMOSS/MOSS-Speech

MOSS-Speech is a true speech-to-speech large language model without text guidance.

Scores updated daily from GitHub, PyPI, and npm data. How scores work