jdh-algo/JoyVASA
Diffusion-based Portrait and Animal Animation
Decouples static 3D facial representations from dynamic motion sequences using a diffusion transformer trained on audio features (wav2vec2 or HuBERT), enabling identity-independent motion generation that extends to animal face animation. The two-stage pipeline extracts 3D facial appearance via LivePortrait, generates motion sequences from speech in a sliding-window fashion, and renders final video through keypoint-based warping and a learned generator. Supports multilingual audio input and is compatible with both portrait and animal image animation through optional MultiScaleDeformableAttention components.
856 stars.
Stars
856
Forks
86
Language
Python
License
MIT
Category
Last pushed
Dec 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/jdh-algo/JoyVASA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
open-mmlab/mmagic
OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄:...
haidog-yaqub/EzAudio
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
CMLab-Korea/Awesome-Video-Frame-Interpolation
[IEEE TCSVT'26] 🂡 AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation
linzhiqiu/t2v_metrics
Evaluating text-to-image/video/3D models with VQAScore
TIGER-AI-Lab/AnyV2V
Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" [TMLR 2024]