Tencent-Hunyuan/HunyuanCustom

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

/ 100

Emerging

Supports subject-consistent video generation from multimodal inputs—text, images, audio, and video—through specialized injection modules including a text-image fusion layer based on LLaVA, an AudioNet for hierarchical audio alignment, and a video-driven patchify-based feature encoder. Built on HunyuanVideo, it enables downstream applications like virtual avatars, singing synthesis, and video object replacement while maintaining identity consistency across frames. Integrates with ComfyUI and HuggingFace, with optimized inference available for 8GB single-GPU setups.

1,211 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 18 / 25

How are scores calculated?

Stars

1,211

Forks

108

Language

Python

License

—

Compare

HunyuanCustom and HunyuanVideo

Higher-rated alternatives

hao-ai-lab/FastVideo

A unified inference and post-training framework for accelerated video generation.

thu-ml/TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

ModelTC/LightX2V

Light Image Video Generation Inference Framework

PKU-YuanGroup/Helios

Helios: Real Real-Time Long Video Generation Model

PKU-YuanGroup/MagicTime

[TPAMI 2025🔥] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Explore Diffusion Models

All categories Trending Diffusion directory Insights