zhao-kun/VibeVoiceFusion

VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA fine-tuning, batch generation, and VRAM optimization. Based on Microsoft's VibeVoice (AR + diffusion architecture)

45
/ 100
Emerging

Implements quantization (FP8) and layer offloading strategies to run on consumer GPUs with 10GB+ VRAM, achieving up to 7GB memory savings at the cost of 3.5x slower inference. Built with a TypeScript/React frontend and Python FastAPI backend, containerized for single-command Docker deployment with automatic HuggingFace model downloads. Supports LoRA fine-tuning with TensorBoard integration, bilingual UI (English/Chinese), and persistent project management including speaker profiles, dialog editing, and generation history with configurable batch operations.

453 stars.

No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 18 / 25

How are scores calculated?

Stars

453

Forks

56

Language

Python

License

Last pushed

Feb 23, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/zhao-kun/VibeVoiceFusion"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.