jdh-algo/JoyVASA

Diffusion-based Portrait and Animal Animation

/ 100

Established

Decouples static 3D facial representations from dynamic motion sequences using a diffusion transformer trained on audio features (wav2vec2 or HuBERT), enabling identity-independent motion generation that extends to animal face animation. The two-stage pipeline extracts 3D facial appearance via LivePortrait, generates motion sequences from speech in a sliding-window fashion, and renders final video through keypoint-based warping and a learned generator. Supports multilingual audio input and is compatible with both portrait and animal image animation through optional MultiScaleDeformableAttention components.

856 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

856

Forks

Language

Python

License

MIT

Related tools

open-mmlab/mmagic

OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄:...

haidog-yaqub/EzAudio

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

CMLab-Korea/Awesome-Video-Frame-Interpolation

[IEEE TCSVT'26] 🂡 AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation

linzhiqiu/t2v_metrics

Evaluating text-to-image/video/3D models with VQAScore

TIGER-AI-Lab/AnyV2V

Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" [TMLR 2024]

Explore Generative AI Tools

All categories Trending Generative AI directory Insights