canopyai/Orpheus-TTS
Towards Human-Sounding Speech
Leverages a Llama-3B backbone to enable emergent LLM capabilities for speech synthesis, supporting zero-shot voice cloning and emotion/intonation control via simple text tags. Achieves ~200ms streaming latency through vLLM-based inference with token-level audio generation, and provides multilingual support across 7 language pairs with customizable finetuning using standard transformer training pipelines. Integrates with Hugging Face for model distribution and Baseten for production inference at fp8/fp16 precision.
6,000 stars.
Stars
6,000
Forks
511
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/canopyai/Orpheus-TTS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo...
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in...
umbertocappellazzo/Omni-AVSR
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition...
primepake/learnable-speech
This repo is text to speech with learnable audio encoder without alignment with transcript reference
ExplainableML/ZerAuCap
[NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language...