TTS and glow-tts
Coqui TTS is a comprehensive production-ready framework that includes Glow-TTS as one of several supported model architectures, making them complements where Glow-TTS serves as a specialized vocoder/synthesis method within the broader toolkit.
About TTS
coqui-ai/TTS
πΈπ¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Supports multiple model architectures spanning spectrogram-based (Tacotron2, Glow-TTS, FastSpeech2) and end-to-end approaches (VITS, XTTS), with built-in speaker encoder for multi-speaker synthesis and voice cloning. Enables sub-200ms streaming inference, fine-tuning on custom datasets, and integrates ~1100 Fairseq models alongside modular vocoder support (MelGAN, ParallelWaveGAN, WaveGrad). Training infrastructure includes dataset curation tools, Tensorboard logging, and a lightweight Trainer API optimized for efficient multi-GPU training.
About glow-tts
jaywalnut310/glow-tts
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Combines normalizing flows with dynamic programming-based monotonic alignment search to enable parallel mel-spectrogram generation without requiring external aligners, eliminating the dependency on autoregressive teacher models. Integrates with HiFi-GAN vocoder for improved audio quality and supports multi-speaker synthesis through conditional generation. Achieves order-of-magnitude speedup over Tacotron 2 while maintaining comparable speech quality with controllable and diverse output.
Scores updated daily from GitHub, PyPI, and npm data. How scores work