voicebox and vox-box

These are **competitors**: both provide self-hosted TTS server solutions with open-source backends, though voicebox offers a broader visual studio interface while vox-box emphasizes OpenAI API compatibility across multiple synthesis engines.

voicebox
67
Established
vox-box
50
Established
Maintenance 25/25
Adoption 10/25
Maturity 11/25
Community 21/25
Maintenance 6/25
Adoption 10/25
Maturity 16/25
Community 18/25
Stars: 13,404
Forks: 1,562
Downloads:
Commits (30d): 244
Language: TypeScript
License: MIT
Stars: 200
Forks: 32
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No Package No Dependents
No Package No Dependents

About voicebox

jamiepine/voicebox

The open-source voice synthesis studio

Supports voice cloning from short audio samples and offers 5 interchangeable TTS engines covering 23 languages with paralinguistic expression tags. Built on Tauri (Rust) with a timeline editor for multi-voice composition, post-processing effects (pitch, reverb, compression, filters), and a REST API for integration. Runs entirely locally with hardware acceleration across macOS (Metal/MLX), Windows (CUDA), Linux, AMD ROCm, and Docker.

About vox-box

gpustack/vox-box

A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.

Provides flexible model sourcing through HuggingFace and ModelScope repositories with GPU acceleration via CUDA, enabling deployment across Linux, Windows, and macOS with configurable model sizes from tiny to large variants. Implements a stateless server architecture that auto-downloads and caches models, supporting both streaming (Paraformer-zh-streaming) and batch processing pipelines with CLI configuration for device binding and data directory management.

Scores updated daily from GitHub, PyPI, and npm data. How scores work