UCSC-VLAA/story-iter

[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization

/ 100

Established

Implements a plug-and-play Global Reference Cross-Attention (GRCA) module that iteratively refines generated frames by incorporating all previous reference images during diffusion denoising, enabling semantic consistency across long sequences (up to 100 frames). Built on SDXL with IP-Adapter integration, the framework operates training-free and supports style control (comic, film, realistic) and ControlNet skeleton guidance for precise character pose management.

949 stars. Actively maintained with 6 commits in the last 30 days.

No Package No Dependents

Maintenance 17 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 22 / 25

How are scores calculated?

Stars

949

Forks

129

Language

Python

License

MIT

Compare

story-iter and 1Prompt1Story

Related models

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...

keivalya/mini-vla

a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...

adobe-research/custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)

byliutao/1Prompt1Story

🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...

HorizonWind2004/reconstruction-alignment

[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal...

Explore Diffusion Models

All categories Trending Diffusion directory Insights