LSimon95/megatts2
Unoffical implementation of Megatts2
Implements a multi-stage neural vocoder pipeline using VQ-GAN for discrete speech representation, ADM (diffusion model) for generation, and PLM for prompt-based synthesis, enabling zero-shot TTS from arbitrary-length speech prompts. Includes Montreal Forced Aligner integration for automatic phoneme-level alignment during dataset preparation. Currently supports Mandarin training with plans for multilingual support and BigVGAN vocoder replacement.
288 stars. No commits in the last 6 months.
Stars
288
Forks
38
Language
Python
License
MIT
Category
Last pushed
Mar 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/LSimon95/megatts2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OpenBMB/VoxCPM
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
IAHispano/Applio
A simple, high-quality voice conversion tool focused on ease of use and performance.
myshell-ai/OpenVoice
Instant voice cloning by MIT and MyShell. Audio foundation model.
codename0og/codename-rvc-fork-4
Codename's rvc fork version 4, based on Applio.
JackismyShephard/ultimate-rvc
An app for creating audio-based content such as song covers and speech using Retrieval-based...