jaywalnut310/vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

50
/ 100
Established

Combines normalizing flows with adversarial training to enable parallel, single-stage synthesis that matches two-stage TTS quality while modeling natural speech variation through a stochastic duration predictor. Implements monotonic alignment search (Cython-optimized) for unsupervised duration learning and supports both single-speaker (LJ Speech) and multi-speaker (VCTK) training pipelines with PyTorch, requiring phoneme preprocessing via g2p.

7,837 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 24 / 25

How are scores calculated?

Stars

7,837

Forks

1,386

Language

Python

License

MIT

Last pushed

Dec 06, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/jaywalnut310/vits"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.