MichiganNLP/Scalable-VLM-Probing
Probe Vision-Language Models
This project helps AI researchers and practitioners evaluate how well vision-language models (VLMs) like CLIP understand the relationship between images and text. It takes an existing dataset of image-sentence pairs and VLM output scores, then correlates these scores with various linguistic features to identify what the model is actually 'seeing' or 'understanding'. You would use this if you are developing or applying VLMs and need to understand their semantic strengths and weaknesses without extensive manual annotation.
No commits in the last 6 months.
Use this if you want to gain deeper insights into why a vision-language model performs well or poorly on specific image-text combinations by analyzing linguistic patterns.
Not ideal if you are looking for a tool to train new vision-language models or for a general-purpose image classification or captioning application.
Stars
5
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Jul 27, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/MichiganNLP/Scalable-VLM-Probing"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
gabeur/mmt
Multi-Modal Transformer for Video Retrieval
JerryYLi/valhalla-nmt
Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"
SkalskiP/awesome-foundation-and-multimodal-models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples +...
benywon/LALM
code and resource for ACL2021 paper 'Multi-Lingual Question Generation with Language Agnostic...
thunlp/cost-optimal-gqa
The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"