TTS-Audio-Suite and ComfyUI-VoxCPM

These are complements: TTS-Audio-Suite provides multiple text-to-speech engines and voice conversion options, while VoxCPM specializes in zero-shot voice cloning, allowing users to combine traditional TTS synthesis with advanced voice cloning capabilities in a single ComfyUI workflow.

TTS-Audio-Suite

Established

ComfyUI-VoxCPM

Emerging

Maintenance 25/25

Adoption 10/25

Maturity 15/25

Community 18/25

Maintenance 6/25

Adoption 10/25

Maturity 15/25

Community 16/25

Stars: 774

Forks: 71

Downloads: —

Commits (30d): 55

Language: Python

License: —

Stars: 390

Forks: 42

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No Package No Dependents

About TTS-Audio-Suite

diodiogod/TTS-Audio-Suite

A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools

Implements a modular node-based architecture within ComfyUI that abstracts 12 TTS/voice conversion engines behind unified interfaces, enabling workflows to swap engines or chain operations (transcription → subtitle timing → synthesis → voice conversion) without graph restructuring. Provides advanced subtitle authoring through SRT generation from plain text using readability algorithms, per-segment parameter switching via inline tags like `[seed:24]` or ``, and character/language switching within single text blocks—bridging traditional NLP workflows with real-time audio generation at unlimited text lengths.

About ComfyUI-VoxCPM

wildminder/ComfyUI-VoxCPM

ComfyUI node for highly expressive speech and realistic zero-shot voice cloning

Implements a tokenizer-free diffusion-based TTS architecture built on MiniCPM-4 that models speech in continuous space rather than discrete tokens, enabling context-aware prosody generation. Includes native LoRA fine-tuning support within ComfyUI for custom voice style training, automatic model management with efficient VRAM offloading, and operates at 6.25Hz token rate for faster synthesis on consumer hardware. Integrates seamlessly with ComfyUI's node workflow system, supporting optional reference audio for voice cloning and compatible with multiple inference backends (CUDA, CPU, MPS, DirectML).

Related comparisons

TTS-Audio-Suite and VibeVoice-ComfyUI TTS-Audio-Suite and ComfyUI-VibeVoice TTS-Audio-Suite and ComfyUI-EdgeTTS TTS-Audio-Suite and ComfyUI-Maya1_TTS TTS-Audio-Suite and ComfyUI-XTTS TTS-Audio-Suite and ComfyUI-SparkTTS

Scores updated daily from GitHub, PyPI, and npm data. How scores work