TensorFlowTTS and FCH-TTS
These are competitors—both are standalone FastSpeech-based TTS synthesis frameworks offering similar functionality across multiple languages, with TensorFlowTTS being the more mature and widely adopted option.
About TensorFlowTTS
TensorSpeech/TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Implements modular encoder-decoder architectures (Tacotron-2, FastSpeech/2) paired with neural vocoders (MelGAN, HiFi-GAN, Parallel WaveGAN) for end-to-end text-to-speech, enabling sub-real-time inference through TensorFlow 2 optimization techniques like quantization-aware training and pruning. Supports deployment across diverse platforms including TFLite for mobile/embedded systems, C++ inference, and iOS, with pretrained models integrated into Hugging Face Hub. Provides language-agnostic processor pipeline for fine-tuning on new languages, demonstrated with Chinese, Korean, French, and German examples.
About FCH-TTS
atomicoo/FCH-TTS
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。
Based on the README, here's a technical summary: Implements a parallel text-to-mel-spectrogram architecture with separate duration prediction and acoustic models, using MelGAN vocoder for waveform generation. Supports multiple training configurations via YAML, includes pre-trained models for LJSpeech, and integrates optional Weights & Biases logging for experiment tracking. Achieves real-time synthesis speeds on CPU/GPU with batch processing capabilities.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work