TTS-Audio-Suite and VibeVoice-ComfyUI

These are complements that serve different TTS engine preferences within the same ComfyUI workflow—TTS-Audio-Suite provides a multi-engine aggregator supporting RVC, Echo-TTS, Qwen3-TTS and others, while VibeVoice-ComfyUI specializes exclusively in Microsoft's VibeVoice model for users prioritizing that specific architecture's multi-speaker synthesis capabilities.

TTS-Audio-Suite
68
Established
VibeVoice-ComfyUI
58
Established
Maintenance 25/25
Adoption 10/25
Maturity 15/25
Community 18/25
Maintenance 10/25
Adoption 10/25
Maturity 15/25
Community 23/25
Stars: 774
Forks: 71
Downloads:
Commits (30d): 55
Language: Python
License:
Stars: 1,391
Forks: 219
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No Package No Dependents
No Package No Dependents

About TTS-Audio-Suite

diodiogod/TTS-Audio-Suite

A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools

Implements a modular node-based architecture within ComfyUI that abstracts 12 TTS/voice conversion engines behind unified interfaces, enabling workflows to swap engines or chain operations (transcription → subtitle timing → synthesis → voice conversion) without graph restructuring. Provides advanced subtitle authoring through SRT generation from plain text using readability algorithms, per-segment parameter switching via inline tags like `[seed:24]` or ``, and character/language switching within single text blocks—bridging traditional NLP workflows with real-time audio generation at unlimited text lengths.

About VibeVoice-ComfyUI

Enemyx-net/VibeVoice-ComfyUI

A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.

Supports voice cloning from audio samples, LoRA fine-tuning adapters, and multi-speaker conversations with up to 4 distinct voices using speaker labels. The implementation features embedded VibeVoice code with adaptive transformer compatibility, configurable quantization (4-bit/8-bit) for VRAM optimization, and cross-platform GPU support including Apple Silicon via MPS. Operates as a self-contained ComfyUI custom node with automatic text chunking, pause tag insertion, and memory management controls for complex generative workflows.

Scores updated daily from GitHub, PyPI, and npm data. How scores work