nari-labs/dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

/ 100

Established

Built on a 1.6B parameter architecture, Dia directly synthesizes multi-speaker dialogue from transcripts with audio conditioning for voice cloning and emotion control, supporting nonverbal tags like laughter and coughing. Integrates with Hugging Face Transformers and provides inference through Python APIs, CLI, and Gradio UI, with realtime factor performance ranging from 0.9x–2.2x on RTX 4090 depending on precision. Uses the Descript Audio Codec for audio generation and supports speaker consistency via seed fixing or audio prompts.

19,202 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 19 / 25

How are scores calculated?

Stars

19,202

Forks

1,683

Language

Python

License

Apache-2.0

Compare

dia and dia2 dia and Dia-TTS-Server dia and Dia-TTS-Server

Related tools

devnen/Chatterbox-TTS-Server

Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible...

daswer123/xtts-api-server

A simple FastAPI Server to run XTTSv2

jamiepine/voicebox

The open-source voice synthesis studio

Aivis-Project/AivisSpeech-Engine

AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine

jianchang512/ChatTTS-ui

一个简单的本地网页界面，使用ChatTTS将文字合成为语音，同时支持对外提供API接口。A simple native web interface that uses ChatTTS to...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights