nerdyrodent/VQGAN-CLIP

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

/ 100

Emerging

Combines OpenAI's CLIP vision-language model with CompVis's VQGAN vector quantized autoencoder to optimize image generation through iterative gradient-based optimization, enabling text-to-image synthesis with weighted multi-prompt support. Supports both CUDA and ROCm backends with configurable resolution (380x380 to 900x900) and features advanced capabilities like story mode sequencing, style transfer effects, and video generation through iterative feedback loops. Integrates PyTorch Lightning for training infrastructure and includes specialized tooling for batch processing, frame-by-frame video styling, and dynamic zoom effects.

2,653 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 23 / 25

How are scores calculated?

Stars

2,653

Forks

426

Language

Python

License

—

Compare

VQGAN-CLIP and CLIP-Guided-Diffusion

Higher-rated alternatives

NVlabs/Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

FoundationVision/VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈]...

huggingface/finetrainers

Scalable and memory-optimized training of diffusion models

eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

AssemblyAI-Community/MinImagen

MinImagen: A minimal implementation of the Imagen text-to-image model

Explore Diffusion Models

All categories Trending Diffusion directory Insights