Zheng-Chong/CatVTON

[ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

53
/ 100
Established

Employs a concatenation-based architecture that directly feeds garment and person features into the diffusion model's latent space, eliminating complex alignment modules and enabling mask-free inference variants. Built on Stable Diffusion v1.5 with localized DensePose and SCHP pose/parsing extractors, and supports deployment across multiple frameworks including ComfyUI, Gradio, and HuggingFace Spaces, with emerging DiT-based variants (CatV2TON) extending to video try-on.

1,615 stars.

No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

1,615

Forks

207

Language

Python

License

Category

virtual-try-on

Last pushed

Dec 16, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Zheng-Chong/CatVTON"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.