FoundationVision/VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

/ 100

Established

Implements next-scale prediction, a coarse-to-fine autoregressive approach where token generation proceeds by resolution levels rather than raster-scan order, enabling transformers to match or exceed diffusion model quality. Leverages a discrete VAE bottleneck and PyTorch 2.0+ with optional Flash-Attention and xformers backends for accelerated transformer inference on ImageNet-scale datasets. Provides pre-trained checkpoints (310M–2.3B parameters) on Hugging Face alongside a minimal training pipeline for custom image datasets.

8,641 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

8,641

Forks

563

Language

Jupyter Notebook

License

MIT

Related models

NVlabs/Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

nerdyrodent/VQGAN-CLIP

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

huggingface/finetrainers

Scalable and memory-optimized training of diffusion models

eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

AssemblyAI-Community/MinImagen

MinImagen: A minimal implementation of the Imagen text-to-image model

Explore Diffusion Models

All categories Trending Diffusion directory Insights