Multimodal Vision Language Diffusion Models

There are 16 multimodal vision language models tracked. 1 score above 50 (established tier). The highest-rated is zai-org/CogVideo at 52/100 with 12,515 stars.

Get all 16 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=diffusion&subcategory=multimodal-vision-language&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 zai-org/CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

52
Established
2 zhaorw02/DeepMesh

[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation...

38
Emerging
3 YangLing0818/RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and...

36
Emerging
4 thu-nics/FrameFusion

[ICCV'25] The official code of paper "Combining Similarity and Importance...

31
Emerging
5 Yushi-Hu/tifa

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with...

30
Emerging
6 OpenMeshLab/MeshXL

[NeurIPS 2024] MeshXL: Neural Coordinate Field for Generative 3D Foundation...

29
Experimental
7 ByteVisionLab/TokenFlow

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for...

29
Experimental
8 j-min/DSG

Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)

27
Experimental
9 YangLing0818/VideoTetris

[NeurIPS 2024] VideoTetris: Towards Compositional Text-To-Video Generation

26
Experimental
10 jqin4749/MindVideo

Official code base for MinD-Video

24
Experimental
11 showlab/VisorGPT

[NeurIPS 2023] Customize spatial layouts for conditional image synthesis...

24
Experimental
12 InternRobotics/UniHSI

[ICLR 2024 Spotlight] Unified Human-Scene Interaction via Prompted Chain-of-Contacts

24
Experimental
13 GradientSpaces/respace

Code for "ReSpace: Text-Driven 3D Indoor Scene Synthesis and Editing with...

23
Experimental
14 YangLing0818/EditWorld

[ACM Multimedia 2025 Datasets Track] EditWorld: Simulating World Dynamics...

21
Experimental
15 DAMO-NLP-SG/DiGIT

[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling:...

21
Experimental
16 LayoutLLM-T2I/LayoutLLM-T2I

Code for ACM MM'23 paper: LayoutLLM-T2I: Eliciting Layout Guidance from LLM...

12
Experimental