bshall/Tacotron
A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Implements location-relative attention with dynamic convolution to improve alignment robustness in text-to-mel-spectrogram synthesis, enabling stable training on single GPUs with mixed precision. Integrates with the UniversalVocoder for end-to-end audio generation from text via CMUDict phoneme conversion. Provides pretrained LJSpeech weights and preprocessing utilities for dataset training, with architectural optimizations including gradient clipping and modified learning schedules for efficient single-GPU convergence.
115 stars and 32 monthly downloads. No commits in the last 6 months. Available on PyPI.
Stars
115
Forks
26
Language
Python
License
MIT
Category
Last pushed
Dec 02, 2020
Monthly downloads
32
Commits (30d)
0
Dependencies
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/bshall/Tacotron"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
Kyubyong/tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
Rayhane-mamah/Tacotron-2
DeepMind's Tacotron-2 Tensorflow implementation
DemisEom/SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Kyubyong/dc_tts
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
vlomme/Multi-Tacotron-Voice-Cloning
Phoneme multilingual(Russian-English) voice cloning based on