TTS-Audio-Suite and ComfyUI-EdgeTTS

These are competitors offering overlapping text-to-speech functionality for ComfyUI, though the first provides broader multi-engine support (RVC, Qwen3-TTS, etc.) while the second specializes exclusively in Microsoft Edge TTS integration.

TTS-Audio-Suite
68
Established
ComfyUI-EdgeTTS
47
Emerging
Maintenance 25/25
Adoption 10/25
Maturity 15/25
Community 18/25
Maintenance 10/25
Adoption 8/25
Maturity 16/25
Community 13/25
Stars: 774
Forks: 71
Downloads:
Commits (30d): 55
Language: Python
License:
Stars: 66
Forks: 8
Downloads:
Commits (30d): 0
Language: Python
License: GPL-3.0
No Package No Dependents
No Package No Dependents

About TTS-Audio-Suite

diodiogod/TTS-Audio-Suite

A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools

Implements a modular node-based architecture within ComfyUI that abstracts 12 TTS/voice conversion engines behind unified interfaces, enabling workflows to swap engines or chain operations (transcription → subtitle timing → synthesis → voice conversion) without graph restructuring. Provides advanced subtitle authoring through SRT generation from plain text using readability algorithms, per-segment parameter switching via inline tags like `[seed:24]` or ``, and character/language switching within single text blocks—bridging traditional NLP workflows with real-time audio generation at unlimited text lengths.

About ComfyUI-EdgeTTS

1038lab/ComfyUI-EdgeTTS

ComfyUI-EdgeTTS is a powerful text-to-speech node for ComfyUI, leveraging Microsoft's Edge TTS capabilities. It enables seamless conversion of text into natural-sounding speech, supporting multiple languages and voices. Ideal for enhancing user interactions, this node is easy to integrate and customize, making it perfect for various applications.

Provides complementary speech-to-text capabilities via OpenAI's Whisper with multiple model sizes and automatic language detection, alongside audio export nodes supporting WAV/MP3/FLAC formats with quality presets. The implementation uses lazy loading and caching to optimize performance and memory usage within ComfyUI's node-based workflow system. Integrates FFmpeg for audio codec handling and supports GPU acceleration via CUDA for faster Whisper inference.

Scores updated daily from GitHub, PyPI, and npm data. How scores work