Tencent-Hunyuan/HunyuanCustom

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

49
/ 100
Emerging

Supports subject-consistent video generation from multimodal inputs—text, images, audio, and video—through specialized injection modules including a text-image fusion layer based on LLaVA, an AudioNet for hierarchical audio alignment, and a video-driven patchify-based feature encoder. Built on HunyuanVideo, it enables downstream applications like virtual avatars, singing synthesis, and video object replacement while maintaining identity consistency across frames. Integrates with ComfyUI and HuggingFace, with optimized inference available for 8GB single-GPU setups.

1,211 stars.

No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 18 / 25

How are scores calculated?

Stars

1,211

Forks

108

Language

Python

License

Last pushed

Oct 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Tencent-Hunyuan/HunyuanCustom"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.