VibeVoice-ComfyUI and ComfyUI-EdgeTTS

These are **competitors** — both provide ComfyUI nodes for Microsoft text-to-speech synthesis (VibeVoice vs. Edge TTS), serving the same use case of converting text to speech within ComfyUI workflows, so users would typically choose one based on voice quality, language support, or feature preferences rather than use them together.

VibeVoice-ComfyUI

Established

ComfyUI-EdgeTTS

Emerging

Maintenance 10/25

Adoption 10/25

Maturity 15/25

Community 23/25

Maintenance 10/25

Adoption 8/25

Maturity 16/25

Community 13/25

Stars: 1,391

Forks: 219

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

Stars: 66

Forks: 8

Downloads: —

Commits (30d): 0

Language: Python

License: GPL-3.0

No Package No Dependents

About VibeVoice-ComfyUI

Enemyx-net/VibeVoice-ComfyUI

A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.

Supports voice cloning from audio samples, LoRA fine-tuning adapters, and multi-speaker conversations with up to 4 distinct voices using speaker labels. The implementation features embedded VibeVoice code with adaptive transformer compatibility, configurable quantization (4-bit/8-bit) for VRAM optimization, and cross-platform GPU support including Apple Silicon via MPS. Operates as a self-contained ComfyUI custom node with automatic text chunking, pause tag insertion, and memory management controls for complex generative workflows.

About ComfyUI-EdgeTTS

1038lab/ComfyUI-EdgeTTS

ComfyUI-EdgeTTS is a powerful text-to-speech node for ComfyUI, leveraging Microsoft's Edge TTS capabilities. It enables seamless conversion of text into natural-sounding speech, supporting multiple languages and voices. Ideal for enhancing user interactions, this node is easy to integrate and customize, making it perfect for various applications.

Provides complementary speech-to-text capabilities via OpenAI's Whisper with multiple model sizes and automatic language detection, alongside audio export nodes supporting WAV/MP3/FLAC formats with quality presets. The implementation uses lazy loading and caching to optimize performance and memory usage within ComfyUI's node-based workflow system. Integrates FFmpeg for audio codec handling and supports GPU acceleration via CUDA for faster Whisper inference.

Related comparisons

VibeVoice-ComfyUI and TTS-Audio-Suite VibeVoice-ComfyUI and ComfyUI-VibeVoice VibeVoice-ComfyUI and ComfyUI-VoxCPM VibeVoice-ComfyUI and ComfyUI-Maya1_TTS VibeVoice-ComfyUI and ComfyUI-XTTS VibeVoice-ComfyUI and ComfyUI-SparkTTS

Scores updated daily from GitHub, PyPI, and npm data. How scores work