TonyLianLong/LLM-groundedDiffusion
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD, TMLR 2024)
Uses LLMs to parse natural language prompts into spatial layouts (bounding boxes with captions), which then guide diffusion models via cross-attention control and optional GLIGEN adapters for improved compositional generation. Integrated into HuggingFace Diffusers (v0.24.0+) and supports both proprietary APIs (GPT-3.5/4) and self-hosted open-source LLMs (Mixtral, LLaMA 2), with built-in caching, SDXL refinement, and multiple layout-to-image methods (GLIGEN, MultiDiffusion, BoxDiff).
481 stars. No commits in the last 6 months.
Stars
481
Forks
34
Language
Python
License
—
Category
Last pushed
Sep 09, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/TonyLianLong/LLM-groundedDiffusion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ljleb/sd-mecha
Executable State Dict Recipes
SJTU-DENG-Lab/Discrete-Diffusion-Forcing
Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference
declare-lab/tango
A family of diffusion models for text-to-audio generation.
Li-Jinsong/DAEDAL
[ICLR 2026] Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for...
SalesforceAIResearch/CoDA
Salesforce AI Research's open diffusion language model