Tacotron and tacotron
These are **competitors** — both are independent implementations of the same Tacotron text-to-speech architecture in different deep learning frameworks (PyTorch vs. TensorFlow), serving the same use case without dependency on each other.
About Tacotron
bshall/Tacotron
A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Implements location-relative attention with dynamic convolution to improve alignment robustness in text-to-mel-spectrogram synthesis, enabling stable training on single GPUs with mixed precision. Integrates with the UniversalVocoder for end-to-end audio generation from text via CMUDict phoneme conversion. Provides pretrained LJSpeech weights and preprocessing utilities for dataset training, with architectural optimizations including gradient clipping and modified learning schedules for efficient single-GPU convergence.
About tacotron
Kyubyong/tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
Implements encoder-decoder architecture with attention mechanism and Griffin-Lim vocoder for mel-spectrogram-to-waveform conversion, trained on multiple public datasets (LJ Speech, audiobooks, Bible recordings). Includes heavily documented training pipeline with bucketed batches, Noam learning rate scheduling, and gradient clipping, plus pre-trained checkpoints and attention visualization tools for monitoring alignment quality during training.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work