Tacotron and Tacotron-pytorch
These are competitors—both are independent PyTorch implementations of the Tacotron text-to-speech architecture, offering alternative codebases for the same synthesis task with no technical dependency between them.
About Tacotron
bshall/Tacotron
A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Implements location-relative attention with dynamic convolution to improve alignment robustness in text-to-mel-spectrogram synthesis, enabling stable training on single GPUs with mixed precision. Integrates with the UniversalVocoder for end-to-end audio generation from text via CMUDict phoneme conversion. Provides pretrained LJSpeech weights and preprocessing utilities for dataset training, with architectural optimizations including gradient clipping and modified learning schedules for efficient single-GPU convergence.
About Tacotron-pytorch
soobinseo/Tacotron-pytorch
Pytorch implementation of Tacotron
Implements the full Tacotron architecture with encoder-decoder attention, CBHG modules, and mel-spectrogram generation for end-to-end text-to-speech synthesis. Preprocesses text into phoneme indices and audio into spectrograms, supporting the LJSpeech dataset pipeline. Includes separate training and inference scripts for model optimization and TTS sample generation.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work