dia and dia2

Dia2 is the successor to Dia, offering streaming capabilities and real-time generation as an evolutionary improvement rather than a parallel alternative.

dia

Established

dia2

Emerging

Maintenance 6/25

Adoption 10/25

Maturity 15/25

Community 19/25

Maintenance 6/25

Adoption 10/25

Maturity 13/25

Community 18/25

Stars: 19,202

Forks: 1,683

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 1,100

Forks: 91

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No Package No Dependents

About dia

nari-labs/dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Built on a 1.6B parameter architecture, Dia directly synthesizes multi-speaker dialogue from transcripts with audio conditioning for voice cloning and emotion control, supporting nonverbal tags like laughter and coughing. Integrates with Hugging Face Transformers and provides inference through Python APIs, CLI, and Gradio UI, with realtime factor performance ranging from 0.9x–2.2x on RTX 4090 depending on precision. Uses the Descript Audio Codec for audio generation and supports speaker consistency via seed fixing or audio prompts.

About dia2

nari-labs/dia2

TTS model capable of streaming conversational audio in realtime.

Builds on the Kyutai Mimi codec to generate dialogue with speaker conditioning, enabling natural back-and-forth conversations by accepting audio context as input. Supports incremental generation from partial text without waiting for complete input, with 1B/2B model variants optimized for CUDA inference using bfloat16 precision and optional CUDA graph acceleration. Audio conditioning via Whisper transcription allows stable voice output when prefixed with speaker examples, supporting up to 2 minutes of English generation with word-level timestamps.

Related comparisons

dia and Dia-TTS-Server dia and Dia-TTS-Server

Scores updated daily from GitHub, PyPI, and npm data. How scores work