nari-labs/dia2

TTS model capable of streaming conversational audio in realtime.

47
/ 100
Emerging

Builds on the Kyutai Mimi codec to generate dialogue with speaker conditioning, enabling natural back-and-forth conversations by accepting audio context as input. Supports incremental generation from partial text without waiting for complete input, with 1B/2B model variants optimized for CUDA inference using bfloat16 precision and optional CUDA graph acceleration. Audio conditioning via Whisper transcription allows stable voice output when prefixed with speaker examples, supporting up to 2 minutes of English generation with word-level timestamps.

1,100 stars.

No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 13 / 25
Community 18 / 25

How are scores calculated?

Stars

1,100

Forks

91

Language

Python

License

Apache-2.0

Last pushed

Nov 29, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/nari-labs/dia2"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Compare