VALL-E-X and vall-e

Both tools are independent PyTorch implementations of Microsoft's VALL-E zero-shot text-to-speech model, making them direct competitors offering alternative open-source reproductions of the same underlying research.

VALL-E-X

Emerging

vall-e

Established

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 20/25

Maintenance 2/25

Adoption 10/25

Maturity 16/25

Community 23/25

Stars: 7,954

Forks: 781

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

Stars: 2,207

Forks: 334

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Archived Stale 6m No Package No Dependents

Stale 6m No Package No Dependents

About VALL-E-X

Plachtaa/VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Supports multilingual synthesis across English, Chinese, and Japanese with emotion and accent control from short acoustic prompts. Uses an autoregressive architecture combining acoustic token prediction with Vocos neural vocoding for high-quality audio reconstruction. Integrates OpenAI's Whisper for speaker embedding extraction and includes Python APIs compatible with PyTorch 2.0+ on CUDA platforms.

About vall-e

lifeiteng/vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Implements a two-stage autoregressive-then-non-autoregressive decoder architecture that leverages neural codec tokens for speaker-preserving synthesis, trainable on a single GPU. Integrates with the Lhotse dataset framework and k2/Icefall speech processing toolkit for phoneme tokenization, feature extraction, and dataset preparation. Supports both English (LibriTTS) and Mandarin Chinese (AISHELL-1) with configurable acoustic prompt strategies during training.

Scores updated daily from GitHub, PyPI, and npm data. How scores work