Plachtaa/VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Archived

/ 100

Emerging

Supports multilingual synthesis across English, Chinese, and Japanese with emotion and accent control from short acoustic prompts. Uses an autoregressive architecture combining acoustic token prediction with Vocos neural vocoding for high-quality audio reconstruction. Integrates OpenAI's Whisper for speaker embedding extraction and includes Python APIs compatible with PyTorch 2.0+ on CUDA platforms.

7,954 stars. No commits in the last 6 months.

Archived Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

7,954

Forks

781

Language

Python

License

MIT

Compare

VALL-E-X and vall-e

Higher-rated alternatives

canopyai/Orpheus-TTS

Towards Human-Sounding Speech

lifeiteng/vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo...

umbertocappellazzo/Omni-AVSR

Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition...

primepake/learnable-speech

This repo is text to speech with learnable audio encoder without alignment with transcript reference

ExplainableML/ZerAuCap

[NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights