VibeVoice-ComfyUI and ComfyUI-VoxCPM

These are complements that address different synthesis approaches: VibeVoice excels at multi-speaker synthesis from text, while VoxCPM specializes in zero-shot voice cloning from reference audio, allowing users to combine both capabilities in a single ComfyUI workflow for diverse voice generation needs.

VibeVoice-ComfyUI

Established

ComfyUI-VoxCPM

Emerging

Maintenance 10/25

Adoption 10/25

Maturity 15/25

Community 23/25

Maintenance 6/25

Adoption 10/25

Maturity 15/25

Community 16/25

Stars: 1,391

Forks: 219

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

Stars: 390

Forks: 42

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No Package No Dependents

About VibeVoice-ComfyUI

Enemyx-net/VibeVoice-ComfyUI

A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.

Supports voice cloning from audio samples, LoRA fine-tuning adapters, and multi-speaker conversations with up to 4 distinct voices using speaker labels. The implementation features embedded VibeVoice code with adaptive transformer compatibility, configurable quantization (4-bit/8-bit) for VRAM optimization, and cross-platform GPU support including Apple Silicon via MPS. Operates as a self-contained ComfyUI custom node with automatic text chunking, pause tag insertion, and memory management controls for complex generative workflows.

About ComfyUI-VoxCPM

wildminder/ComfyUI-VoxCPM

ComfyUI node for highly expressive speech and realistic zero-shot voice cloning

Implements a tokenizer-free diffusion-based TTS architecture built on MiniCPM-4 that models speech in continuous space rather than discrete tokens, enabling context-aware prosody generation. Includes native LoRA fine-tuning support within ComfyUI for custom voice style training, automatic model management with efficient VRAM offloading, and operates at 6.25Hz token rate for faster synthesis on consumer hardware. Integrates seamlessly with ComfyUI's node workflow system, supporting optional reference audio for voice cloning and compatible with multiple inference backends (CUDA, CPU, MPS, DirectML).

Related comparisons

VibeVoice-ComfyUI and TTS-Audio-Suite VibeVoice-ComfyUI and ComfyUI-VibeVoice VibeVoice-ComfyUI and ComfyUI-EdgeTTS VibeVoice-ComfyUI and ComfyUI-Maya1_TTS VibeVoice-ComfyUI and ComfyUI-XTTS VibeVoice-ComfyUI and ComfyUI-SparkTTS

Scores updated daily from GitHub, PyPI, and npm data. How scores work