rtk-ai/vox

A universal AI toolkit for high-performance Speech-to-Text (STT) and Text-to-Speech (TTS) processing, designed for low-latency and easy model integration.

/ 100

Emerging

Supports five pluggable TTS backends (macOS `say`, ONNX-based `kokoro`, Rust/Candle `qwen-native`, PyTorch `voxtream`, and MLX `qwen`) with zero-shot voice cloning on three of them, achieving 2–3s warm latency on Apple Silicon and 19s on CUDA. Built in Rust with Python interop, exposes a daemon mode for persistent model loading, and integrates as an MCP server or CLI tool into 14+ AI coding assistants (Claude Code, Cursor, VS Code, Zed). Includes SQLite state tracking, interactive TUI configuration, and voice recording/cloning workflows entirely offline.

No Package No Dependents

Maintenance 13 / 25

Adoption 7 / 25

Maturity 11 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Rust

License

—

Compare

vox and vox-serve

Higher-rated alternatives

alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

huggingface/speech-to-speech

Build local voice agents with open-source models

linto-ai/WebVoiceSDK

Buildings block for voice-enabled applications in the browser

Picovoice/speech-to-text-benchmark

speech to text benchmark framework

vox-serve/vox-serve

A Streaming-Native Serving Engine for TTS/STS Models

Explore Voice AI Tools

All categories Trending Voice AI directory Insights