herimor/voxtream
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control
Built on the Mimi streaming audio codec and classifier-free guidance for dynamic speed adjustment, the model generates speech in real-time chunks with 74ms first-packet latency. Supports multilingual acoustic prompts via prompt text masking, integrates with Gradio for web demos and WebSocket for streaming inference, and achieves 4x faster-than-realtime performance on consumer GPUs with optional CUDA graph compilation.
210 stars and 361 monthly downloads. Available on PyPI.
Stars
210
Forks
24
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 17, 2026
Monthly downloads
361
Commits (30d)
0
Dependencies
19
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/herimor/voxtream"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
EveryVoiceTTS/EveryVoice
The EveryVoice TTS Toolkit - Text To Speech for your language
kadirnar/VoiceHub
VoiceHub: A Unified Inference Interface for TTS Models
NeonGeckoCom/neon-tts-plugin-coqui
Coqui AI TTS plugin
Atm4x/tts-with-rvc
TTS with RVC-module to generate .wav audios
thorstenMueller/Thorsten-Voice
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be...