Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
Employs a two-stage autoregressive-non-autoregressive (AR-NAR) pipeline with multinomial DDPM refinement for high-fidelity prosody control, requiring only 5 seconds of reference audio for speaker cloning. Enables fine-grained prosody steering through punctuation and capitalization in the transcript, with optional "deep clone" mode using reference transcripts for enhanced quality. Distributed via torch.hub and HuggingFace with Docker support, supporting inference configurations for temperature, top-k sampling, and frequency penalty tuning.
2,814 stars. No commits in the last 6 months.
Stars
2,814
Forks
246
Language
Jupyter Notebook
License
AGPL-3.0
Category
Last pushed
Aug 01, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Camb-ai/MARS5-TTS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OpenBMB/VoxCPM
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
IAHispano/Applio
A simple, high-quality voice conversion tool focused on ease of use and performance.
myshell-ai/OpenVoice
Instant voice cloning by MIT and MyShell. Audio foundation model.
codename0og/codename-rvc-fork-4
Codename's rvc fork version 4, based on Applio.
JackismyShephard/ultimate-rvc
An app for creating audio-based content such as song covers and speech using Retrieval-based...