songweige/rich-text-to-image

Rich-Text-to-Image Generation

/ 100

Emerging

Leverages cross-attention maps from diffusion models to spatially ground text tokens, then applies region-based diffusion with rich-text formatting attributes (font size, color, style, footnotes) to enable fine-grained control over token emphasis, precise color rendering, and localized artistic style application. Integrates with Stable Diffusion v1-5, SD-XL, and fine-tuned variants through HuggingFace, with native support for Automatic1111 WebUI and LoRA checkpoints.

801 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

801

Forks

Language

Python

License

MIT

Higher-rated alternatives

NVlabs/Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

FoundationVision/VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈]...

nerdyrodent/VQGAN-CLIP

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

huggingface/finetrainers

Scalable and memory-optimized training of diffusion models

eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

Explore Diffusion Models

All categories Trending Diffusion directory Insights