VQGAN-CLIP and CLIP-Guided-Diffusion
These are ecosystem siblings—both are local implementations of different generative approaches (VQGAN and diffusion) that share the same CLIP guidance mechanism for steering text-to-image generation.
About VQGAN-CLIP
nerdyrodent/VQGAN-CLIP
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
Combines OpenAI's CLIP vision-language model with CompVis's VQGAN vector quantized autoencoder to optimize image generation through iterative gradient-based optimization, enabling text-to-image synthesis with weighted multi-prompt support. Supports both CUDA and ROCm backends with configurable resolution (380x380 to 900x900) and features advanced capabilities like story mode sequencing, style transfer effects, and video generation through iterative feedback loops. Integrates PyTorch Lightning for training infrastructure and includes specialized tooling for batch processing, frame-by-frame video styling, and dynamic zoom effects.
About CLIP-Guided-Diffusion
nerdyrodent/CLIP-Guided-Diffusion
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.
Combines OpenAI's CLIP vision-language model with guided diffusion to generate images from text prompts at 256x256 or 512x512 resolution. Uses unconditional diffusion models iteratively refined by CLIP embeddings, supporting weighted multi-prompt inputs, image-to-image generation, and fine-grained control via guidance scales for prompt adherence, smoothness, and color range. Includes optional Real-ESRGAN upscaling and video generation of the diffusion process.
Scores updated daily from GitHub, PyPI, and npm data. How scores work