vall-e and VALL-E-X
Both tools are independent PyTorch implementations of Microsoft's VALL-E zero-shot text-to-speech model, making them direct competitors offering alternative open-source reproductions of the same underlying research.
About vall-e
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
This project helps create realistic, human-like speech from text. You provide written text and a short audio sample of a person's voice, and it generates that text spoken in the provided voice. This is useful for content creators, audiobook producers, or anyone needing to generate custom speech with specific speaker identities.
About VALL-E-X
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Supports multilingual synthesis across English, Chinese, and Japanese with emotion and accent control from short acoustic prompts. Uses an autoregressive architecture combining acoustic token prediction with Vocos neural vocoding for high-quality audio reconstruction. Integrates OpenAI's Whisper for speaker embedding extraction and includes Python APIs compatible with PyTorch 2.0+ on CUDA platforms.
Scores updated daily from GitHub, PyPI, and npm data. How scores work