VibeVoice-ComfyUI and ComfyUI-EdgeTTS

These are **competitors** — both provide ComfyUI nodes for Microsoft text-to-speech synthesis (VibeVoice vs. Edge TTS), serving the same use case of converting text to speech within ComfyUI workflows, so users would typically choose one based on voice quality, language support, or feature preferences rather than use them together.

VibeVoice-ComfyUI
58
Established
ComfyUI-EdgeTTS
47
Emerging
Maintenance 10/25
Adoption 10/25
Maturity 15/25
Community 23/25
Maintenance 10/25
Adoption 8/25
Maturity 16/25
Community 13/25
Stars: 1,391
Forks: 219
Downloads:
Commits (30d): 0
Language: Python
License: MIT
Stars: 66
Forks: 8
Downloads:
Commits (30d): 0
Language: Python
License: GPL-3.0
No Package No Dependents
No Package No Dependents

About VibeVoice-ComfyUI

Enemyx-net/VibeVoice-ComfyUI

A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.

Supports voice cloning from audio samples, LoRA fine-tuning adapters, and multi-speaker conversations with up to 4 distinct voices using speaker labels. The implementation features embedded VibeVoice code with adaptive transformer compatibility, configurable quantization (4-bit/8-bit) for VRAM optimization, and cross-platform GPU support including Apple Silicon via MPS. Operates as a self-contained ComfyUI custom node with automatic text chunking, pause tag insertion, and memory management controls for complex generative workflows.

About ComfyUI-EdgeTTS

1038lab/ComfyUI-EdgeTTS

ComfyUI-EdgeTTS is a powerful text-to-speech node for ComfyUI, leveraging Microsoft's Edge TTS capabilities. It enables seamless conversion of text into natural-sounding speech, supporting multiple languages and voices. Ideal for enhancing user interactions, this node is easy to integrate and customize, making it perfect for various applications.

Provides complementary speech-to-text capabilities via OpenAI's Whisper with multiple model sizes and automatic language detection, alongside audio export nodes supporting WAV/MP3/FLAC formats with quality presets. The implementation uses lazy loading and caching to optimize performance and memory usage within ComfyUI's node-based workflow system. Integrates FFmpeg for audio codec handling and supports GPU acceleration via CUDA for faster Whisper inference.

Scores updated daily from GitHub, PyPI, and npm data. How scores work