keonlee9420/STYLER
Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021
Decomposes speech into disentangled style factors (prosody, speaker identity, noise) using supervised learning and domain adversarial training, enabling fine-grained style control during synthesis. Employs a non-autoregressive architecture with a novel Mel Calibrator for audio-text alignment and Residual Decoding for noise-robust style transfer. Integrates HiFi-GAN vocoding, Montreal Forced Aligner for phoneme alignment, and DeepSpeaker embeddings, with support for both VCTK and WHAM! datasets for clean and noisy speech training.
160 stars. No commits in the last 6 months.
Stars
160
Forks
31
Language
Python
License
MIT
Category
Last pushed
Jun 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/keonlee9420/STYLER"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TensorSpeech/TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for...
lucasnewman/nanospeech
A simple, hackable text-to-speech system in PyTorch and MLX
Tomiinek/Multilingual_Text_to_Speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing,...
jxzhanggg/nonparaSeq2seqVC_code
Implementation code of non-parallel sequence-to-sequence VC
yl4579/PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions