nari-labs/dia2
TTS model capable of streaming conversational audio in realtime.
Builds on the Kyutai Mimi codec to generate dialogue with speaker conditioning, enabling natural back-and-forth conversations by accepting audio context as input. Supports incremental generation from partial text without waiting for complete input, with 1B/2B model variants optimized for CUDA inference using bfloat16 precision and optional CUDA graph acceleration. Audio conditioning via Whisper transcription allows stable voice output when prefixed with speaker examples, supporting up to 2 minutes of English generation with word-level timestamps.
1,100 stars.
Stars
1,100
Forks
91
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/nari-labs/dia2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
devnen/Chatterbox-TTS-Server
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible...
daswer123/xtts-api-server
A simple FastAPI Server to run XTTSv2
jamiepine/voicebox
The open-source voice synthesis studio
Aivis-Project/AivisSpeech-Engine
AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
jianchang512/ChatTTS-ui
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to...