voicebox and vox-box

These are **competitors**: both provide self-hosted TTS server solutions with open-source backends, though voicebox offers a broader visual studio interface while vox-box emphasizes OpenAI API compatibility across multiple synthesis engines.

voicebox

Established

vox-box

Established

Maintenance 25/25

Adoption 10/25

Maturity 11/25

Community 21/25

Maintenance 6/25

Adoption 10/25

Maturity 16/25

Community 18/25

Stars: 13,404

Forks: 1,562

Downloads: —

Commits (30d): 244

Language: TypeScript

License: MIT

Stars: 200

Forks: 32

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No Package No Dependents

About voicebox

jamiepine/voicebox

The open-source voice synthesis studio

Supports voice cloning from short audio samples and offers 5 interchangeable TTS engines covering 23 languages with paralinguistic expression tags. Built on Tauri (Rust) with a timeline editor for multi-voice composition, post-processing effects (pitch, reverb, compression, filters), and a REST API for integration. Runs entirely locally with hardware acceleration across macOS (Metal/MLX), Windows (CUDA), Linux, AMD ROCm, and Docker.

About vox-box

gpustack/vox-box

A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.

Provides flexible model sourcing through HuggingFace and ModelScope repositories with GPU acceleration via CUDA, enabling deployment across Linux, Windows, and macOS with configurable model sizes from tiny to large variants. Implements a stateless server architecture that auto-downloads and caches models, supporting both streaming (Paraformer-zh-streaming) and batch processing pipelines with CLI configuration for device binding and data directory management.

Scores updated daily from GitHub, PyPI, and npm data. How scores work