FastSpeech2 and Expressive-FastSpeech2
The Expressive-FastSpeech2 implementation extends the base FastSpeech2 architecture with emotional and conversational capabilities, making them ecosystem siblings where one builds upon the foundational model of the other.
About FastSpeech2
rishikksh20/FastSpeech2
PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech
Builds on ESPnet's FastSpeech architecture with explicit duration, pitch, and energy prediction modules for fine-grained prosody control. Integrates NVIDIA's Tacotron 2 preprocessing pipeline with MelGAN vocoding, and supports Montreal Forced Aligner for dataset phoneme alignment without manual text-audio synchronization. Includes TorchScript export capability and pre-aligned LJSpeech filelists for immediate training.
About Expressive-FastSpeech2
keonlee9420/Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Non-autoregressive architecture enabling fast inference while conditioning on categorical or continuous emotion descriptors and conversational context through separate branch implementations. Includes annotated datasets (IEMOCAP for English, AIHub Multimodal for Korean) and language-specific text processing pipelines with Montreal Forced Aligner integration for adapting to new languages. Provides multi-speaker synthesis with emotion/conversation-aware prosody control as a PyTorch framework extending FastSpeech2's base architecture.
Scores updated daily from GitHub, PyPI, and npm data. How scores work