Henry-23/VideoChat

实时交互数字人,可自定义形象与音色,支持音色克隆,对话延迟低至3s。Real-time voice interactive digital human, customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.

46
/ 100
Emerging

Supports both cascaded (ASR-LLM-TTS-THG) and end-to-end (MLLM-THG) pipelines, with the cascaded approach requiring ~8GB VRAM and achieving 3s first-token latency on a single A100. Integrates FunASR, Qwen/GLM-4-Voice, GPT-SoVITS/CosyVoice for TTS, and MuseTalk for facial animation, with flexible deployment options using local inference, vLLM acceleration, or cloud APIs (Alibaba DashScope). Enables custom avatars through video registration and voice cloning via reference audio samples.

1,223 stars.

No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 9 / 25
Community 21 / 25

How are scores calculated?

Stars

1,223

Forks

158

Language

Python

License

MIT

Last pushed

Dec 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/Henry-23/VideoChat"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.