Vision Language Models
There are 2 vision language models tracked. The highest-rated is snap-research/MMVID at 35/100 with 192 stars.
Get all 2 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=diffusion&subcategory=vision-language-models&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
snap-research/MMVID
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning |
|
Emerging |
| 2 |
chuhaojin/Text2Poster-ICASSP-22
Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out... |
|
Emerging |