TTS-Audio-Suite and ComfyUI-VoxCPM
These are complements: TTS-Audio-Suite provides multiple text-to-speech engines and voice conversion options, while VoxCPM specializes in zero-shot voice cloning, allowing users to combine traditional TTS synthesis with advanced voice cloning capabilities in a single ComfyUI workflow.
About TTS-Audio-Suite
diodiogod/TTS-Audio-Suite
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
Implements a modular node-based architecture within ComfyUI that abstracts 12 TTS/voice conversion engines behind unified interfaces, enabling workflows to swap engines or chain operations (transcription → subtitle timing → synthesis → voice conversion) without graph restructuring. Provides advanced subtitle authoring through SRT generation from plain text using readability algorithms, per-segment parameter switching via inline tags like `[seed:24]` or `
About ComfyUI-VoxCPM
wildminder/ComfyUI-VoxCPM
ComfyUI node for highly expressive speech and realistic zero-shot voice cloning
Implements a tokenizer-free diffusion-based TTS architecture built on MiniCPM-4 that models speech in continuous space rather than discrete tokens, enabling context-aware prosody generation. Includes native LoRA fine-tuning support within ComfyUI for custom voice style training, automatic model management with efficient VRAM offloading, and operates at 6.25Hz token rate for faster synthesis on consumer hardware. Integrates seamlessly with ComfyUI's node workflow system, supporting optional reference audio for voice cloning and compatible with multiple inference backends (CUDA, CPU, MPS, DirectML).
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work