UCSC-VLAA/story-iter
[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization
Implements a plug-and-play Global Reference Cross-Attention (GRCA) module that iteratively refines generated frames by incorporating all previous reference images during diffusion denoising, enabling semantic consistency across long sequences (up to 100 frames). Built on SDXL with IP-Adapter integration, the framework operates training-free and supports style control (comic, film, realistic) and ControlNet skeleton guidance for precise character pose management.
949 stars. Actively maintained with 6 commits in the last 30 days.
Stars
949
Forks
129
Language
Python
License
MIT
Category
Last pushed
Feb 18, 2026
Commits (30d)
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/UCSC-VLAA/story-iter"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...
keivalya/mini-vla
a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...
adobe-research/custom-diffusion
Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
byliutao/1Prompt1Story
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...
HorizonWind2004/reconstruction-alignment
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal...