keonlee9420/STYLER

Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021

48
/ 100
Emerging

Decomposes speech into disentangled style factors (prosody, speaker identity, noise) using supervised learning and domain adversarial training, enabling fine-grained style control during synthesis. Employs a non-autoregressive architecture with a novel Mel Calibrator for audio-text alignment and Residual Decoding for noise-robust style transfer. Integrates HiFi-GAN vocoding, Montreal Forced Aligner for phoneme alignment, and DeepSpeaker embeddings, with support for both VCTK and WHAM! datasets for clean and noisy speech training.

160 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

160

Forks

31

Language

Python

License

MIT

Last pushed

Jun 05, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/keonlee9420/STYLER"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.