Henry-23/VideoChat
实时交互数字人,可自定义形象与音色,支持音色克隆,对话延迟低至3s。Real-time voice interactive digital human, customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.
Supports both cascaded (ASR-LLM-TTS-THG) and end-to-end (MLLM-THG) pipelines, with the cascaded approach requiring ~8GB VRAM and achieving 3s first-token latency on a single A100. Integrates FunASR, Qwen/GLM-4-Voice, GPT-SoVITS/CosyVoice for TTS, and MuseTalk for facial animation, with flexible deployment options using local inference, vLLM acceleration, or cloud APIs (Alibaba DashScope). Enables custom avatars through video registration and voice cloning via reference audio samples.
1,223 stars.
Stars
1,223
Forks
158
Language
Python
License
MIT
Category
Last pushed
Dec 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Henry-23/VideoChat"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
livekit/livekit
End-to-end realtime stack for connecting humans and AI
met4citizen/TalkingHead
Talking Head (3D): A JavaScript class for real-time lip-sync using full-body 3D avatars.
dmisol/flexatar-virtual-webcam
Personalized Virtual Webcam for WebRTC
zslrmhb/Omniverse-Virtual-Assisstant
Audio2Face Avatar with Riva SDK functionality
Sgvkamalakar/Azure-Talking-Avatar
Explore the power of Azure Text-to-Speech with interactive talking avatar, Lisa 👩🏻🦱. Choose...