VibeVoice-ComfyUI and ComfyUI-VibeVoice
These are competing implementations of the same VibeVoice TTS model for ComfyUI, differing in architecture and feature completeness, so users would select one based on their specific workflow needs rather than use both together.
About VibeVoice-ComfyUI
Enemyx-net/VibeVoice-ComfyUI
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
Supports voice cloning from audio samples, LoRA fine-tuning adapters, and multi-speaker conversations with up to 4 distinct voices using speaker labels. The implementation features embedded VibeVoice code with adaptive transformer compatibility, configurable quantization (4-bit/8-bit) for VRAM optimization, and cross-platform GPU support including Apple Silicon via MPS. Operates as a self-contained ComfyUI custom node with automatic text chunking, pause tag insertion, and memory management controls for complex generative workflows.
About ComfyUI-VibeVoice
wildminder/ComfyUI-VibeVoice
ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio
Integrates Microsoft's VibeVoice model directly into ComfyUI workflows for multi-speaker dialogue generation, supporting voice cloning via reference audio and hybrid zero-shot voice generation. Features 4-bit LLM quantization, multiple attention backends (eager/SDPA/Flash Attention/SageAttention), and automatic model management with configurable diffusion parameters for fine-grained control over speech synthesis.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work