nerdyrodent/CLIP-Guided-Diffusion

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

/ 100

Emerging

Combines OpenAI's CLIP vision-language model with guided diffusion to generate images from text prompts at 256x256 or 512x512 resolution. Uses unconditional diffusion models iteratively refined by CLIP embeddings, supporting weighted multi-prompt inputs, image-to-image generation, and fine-grained control via guidance scales for prompt adherence, smoothness, and color range. Includes optional Real-ESRGAN upscaling and video generation of the diffusion process.

385 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

385

Forks

Language

Python

License

—

Compare

CLIP-Guided-Diffusion and VQGAN-CLIP

Higher-rated alternatives

NVlabs/Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

FoundationVision/VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈]...

nerdyrodent/VQGAN-CLIP

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

huggingface/finetrainers

Scalable and memory-optimized training of diffusion models

eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

Explore Diffusion Models

All categories Trending Diffusion directory Insights