nerdyrodent/VQGAN-CLIP
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
Combines OpenAI's CLIP vision-language model with CompVis's VQGAN vector quantized autoencoder to optimize image generation through iterative gradient-based optimization, enabling text-to-image synthesis with weighted multi-prompt support. Supports both CUDA and ROCm backends with configurable resolution (380x380 to 900x900) and features advanced capabilities like story mode sequencing, style transfer effects, and video generation through iterative feedback loops. Integrates PyTorch Lightning for training infrastructure and includes specialized tooling for batch processing, frame-by-frame video styling, and dynamic zoom effects.
2,653 stars. No commits in the last 6 months.
Stars
2,653
Forks
426
Language
Python
License
—
Category
Last pushed
Oct 02, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/nerdyrodent/VQGAN-CLIP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVlabs/Sana
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
FoundationVision/VAR
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈]...
huggingface/finetrainers
Scalable and memory-optimized training of diffusion models
eps696/aphantasia
CLIP + FFT/DWT/RGB = text to image/video
AssemblyAI-Community/MinImagen
MinImagen: A minimal implementation of the Imagen text-to-image model