sihyun-yu/REPA

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

42
/ 100
Emerging

Aligns noisy diffusion states with frozen pretrained visual encoder representations (DINOv2, CLIP, MAE, etc.) to accelerate Diffusion Transformer training by 17.5x while achieving FID=1.42 on ImageNet. Supports multiple encoder architectures and scales to 512×512 resolution and text-to-image generation via configurable projection depth and alignment coefficients. Built on SiT/DiT frameworks with accelerate-based distributed training and automatic checkpoint management.

1,582 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

1,582

Forks

81

Language

Python

License

MIT

Last pushed

Mar 16, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/sihyun-yu/REPA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.