Henry-23/VideoChat

实时交互数字人，可自定义形象与音色，支持音色克隆，对话延迟低至3s。Real-time voice interactive digital human, customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.

/ 100

Emerging

Supports both cascaded (ASR-LLM-TTS-THG) and end-to-end (MLLM-THG) pipelines, with the cascaded approach requiring ~8GB VRAM and achieving 3s first-token latency on a single A100. Integrates FunASR, Qwen/GLM-4-Voice, GPT-SoVITS/CosyVoice for TTS, and MuseTalk for facial animation, with flexible deployment options using local inference, vLLM acceleration, or cloud APIs (Alibaba DashScope). Enables custom avatars through video registration and voice cloning via reference audio samples.

1,223 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 21 / 25

How are scores calculated?

Stars

1,223

Forks

158

Language

Python

License

MIT

Higher-rated alternatives

livekit/livekit

End-to-end realtime stack for connecting humans and AI

met4citizen/TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using full-body 3D avatars.

dmisol/flexatar-virtual-webcam

Personalized Virtual Webcam for WebRTC

zslrmhb/Omniverse-Virtual-Assisstant

Audio2Face Avatar with Riva SDK functionality

Sgvkamalakar/Azure-Talking-Avatar

Explore the power of Azure Text-to-Speech with interactive talking avatar, Lisa 👩🏻‍🦱. Choose...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights