tacotron and tacotron_asr
These are ecosystem siblings—one implements Tacotron for the TTS (text-to-speech) direction while the other adapts the same architecture for the reverse ASR (automatic speech recognition) direction, sharing the same foundational model design.
About tacotron
Kyubyong/tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
Implements encoder-decoder architecture with attention mechanism and Griffin-Lim vocoder for mel-spectrogram-to-waveform conversion, trained on multiple public datasets (LJ Speech, audiobooks, Bible recordings). Includes heavily documented training pipeline with bucketed batches, Noam learning rate scheduling, and gradient clipping, plus pre-trained checkpoints and attention visualization tools for monitoring alignment quality during training.
About tacotron_asr
Kyubyong/tacotron_asr
Speech Recognition Using Tacotron
Adapts the Tacotron text-to-speech architecture for automatic speech recognition by reversing the task flow—converting mel-spectrogram and linear spectrogram inputs to character-level text output. Built on TensorFlow 1.1 with attention-based encoder-decoder networks and trained on the World English Bible dataset (audio paired with verse-level text transcriptions). Demonstrates competitive results on long-form speech recognition while showcasing the architectural flexibility of the original Tacotron model for inverse tasks.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work