xtts-webui and xtts2-ui
These are competitors—both provide web UIs for the same XTTS voice cloning model, with the primary difference being that the second tool is specifically optimized for XTTS-2 and advertises shorter audio requirements (10 seconds vs. the first tool's broader approach).
About xtts-webui
daswer123/xtts-webui
Webui for using XTTS and for finetuning it
Integrates XTTSv2 with modular voice processing pipelines supporting RVC, OpenVoice, and Resemble Enhance for post-processing synthesis results. Provides batch audio dubbing with automatic translation while preserving speaker identity, plus fine-tuning capabilities with custom model selection and optimized export. Runs locally on NVIDIA GPUs (6GB+ VRAM) via PyTorch/CUDA, with optional deepspeed acceleration and low-VRAM mode for resource-constrained setups.
About xtts2-ui
BoltzmannEntropy/xtts2-ui
A User Interface for XTTS-2 Text-Based Voice Cloning using only 10 seconds of speech
Built on Coqui's XTTS-v2 multilingual model, this project provides both a web UI (Streamlit) and terminal interface for voice cloning across 16 languages with integrated recording and file upload capabilities. The architecture supports GPU acceleration via PyTorch CUDA and automatically downloads pretrained models on first run, with the cloning process requiring only a 10-second 24kHz WAV reference sample to generate speech in the target voice and language.
Scores updated daily from GitHub, PyPI, and npm data. How scores work