huggingface/speech-to-speech

Build local voice agents with open-source models

72
/ 100
Verified

Implements a cascaded four-stage pipeline (VAD → STT → LLM → TTS) with pluggable components across all stages, supporting models from Hugging Face Hub, MLX for Apple Silicon acceleration, and external libraries like Whisper, MeloTTS, and ChatTTS. Deployable locally, via TCP sockets, or WebSockets with multi-language support and streaming capabilities. Optimizes for low-latency inference on consumer hardware including CUDA and Apple Silicon with torch.compile support.

4,541 stars. Actively maintained with 90 commits in the last 30 days.

No Package No Dependents
Maintenance 25 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

4,541

Forks

518

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

90

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/huggingface/speech-to-speech"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.