jdh-algo/JoyHallo

JoyHallo: Digital human model for Mandarin

/ 100

Emerging

Implements audio-driven video synthesis with a semi-decoupled architecture that decouples lip, expression, and pose features to improve efficiency and cross-lingual capability. Uses Chinese wav2vec2 for Mandarin audio embedding and integrates Stable Diffusion with motion modules for frame generation, achieving 14.3% faster inference than the base Hallo model. Supports both Mandarin and English video generation while maintaining strong cross-language performance on the proprietary 29-hour jdh-Hallo dataset.

522 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

522

Forks

Language

Python

License

MIT

Higher-rated alternatives

OpenVGLab/OmniLottie

[CVPR 2026🔥] 🧑‍🎨 OmniLottie, an open-sourced multi-modal instructed vector animation generator...

Mrkomiljon/awesome-generative-ai

Multimodal generative AI resources : talking heads, STT, TTS, image & video generation, and more.

NVIDIA/Maya-ACE

Maya-ACE: A Reference Client Implementation for NVIDIA ACE Audio2Face Service

michaelzhang-ai/Speech2Video

ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses"

Boese0601/Dyadic-Interaction-Modeling

[ECCV 2024] Dyadic Interaction Modeling for Social Behavior Generation

Explore Generative AI Tools

All categories Trending Generative AI directory Insights