dia and dia2

Dia2 is the successor to Dia, offering streaming capabilities and real-time generation as an evolutionary improvement rather than a parallel alternative.

dia
50
Established
dia2
47
Emerging
Maintenance 6/25
Adoption 10/25
Maturity 15/25
Community 19/25
Maintenance 6/25
Adoption 10/25
Maturity 13/25
Community 18/25
Stars: 19,202
Forks: 1,683
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 1,100
Forks: 91
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No Package No Dependents
No Package No Dependents

About dia

nari-labs/dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Built on a 1.6B parameter architecture, Dia directly synthesizes multi-speaker dialogue from transcripts with audio conditioning for voice cloning and emotion control, supporting nonverbal tags like laughter and coughing. Integrates with Hugging Face Transformers and provides inference through Python APIs, CLI, and Gradio UI, with realtime factor performance ranging from 0.9x–2.2x on RTX 4090 depending on precision. Uses the Descript Audio Codec for audio generation and supports speaker consistency via seed fixing or audio prompts.

About dia2

nari-labs/dia2

TTS model capable of streaming conversational audio in realtime.

Builds on the Kyutai Mimi codec to generate dialogue with speaker conditioning, enabling natural back-and-forth conversations by accepting audio context as input. Supports incremental generation from partial text without waiting for complete input, with 1B/2B model variants optimized for CUDA inference using bfloat16 precision and optional CUDA graph acceleration. Audio conditioning via Whisper transcription allows stable voice output when prefixed with speaker examples, supporting up to 2 minutes of English generation with word-level timestamps.

Scores updated daily from GitHub, PyPI, and npm data. How scores work