FoundationVision/VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

50
/ 100
Established

Implements next-scale prediction, a coarse-to-fine autoregressive approach where token generation proceeds by resolution levels rather than raster-scan order, enabling transformers to match or exceed diffusion model quality. Leverages a discrete VAE bottleneck and PyTorch 2.0+ with optional Flash-Attention and xformers backends for accelerated transformer inference on ImageNet-scale datasets. Provides pre-trained checkpoints (310M–2.3B parameters) on Hugging Face alongside a minimal training pipeline for custom image datasets.

8,641 stars.

No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

8,641

Forks

563

Language

Jupyter Notebook

License

MIT

Last pushed

Nov 10, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/FoundationVision/VAR"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.