vox-serve and vox
These are competitors—both provide TTS/STS inference serving solutions, with vox-serve optimized for streaming-native deployment while vox offers a broader AI toolkit approach, making them alternative choices for the same use case rather than tools designed to work together.
About vox-serve
vox-serve/vox-serve
A Streaming-Native Serving Engine for TTS/STS Models
About vox
rtk-ai/vox
A universal AI toolkit for high-performance Speech-to-Text (STT) and Text-to-Speech (TTS) processing, designed for low-latency and easy model integration.
Supports five pluggable TTS backends (macOS `say`, ONNX-based `kokoro`, Rust/Candle `qwen-native`, PyTorch `voxtream`, and MLX `qwen`) with zero-shot voice cloning on three of them, achieving 2–3s warm latency on Apple Silicon and 19s on CUDA. Built in Rust with Python interop, exposes a daemon mode for persistent model loading, and integrates as an MCP server or CLI tool into 14+ AI coding assistants (Claude Code, Cursor, VS Code, Zed). Includes SQLite state tracking, interactive TUI configuration, and voice recording/cloning workflows entirely offline.
Scores updated daily from GitHub, PyPI, and npm data. How scores work