MichiganNLP/Scalable-VLM-Probing

Probe Vision-Language Models

/ 100

Emerging

This project helps AI researchers and practitioners evaluate how well vision-language models (VLMs) like CLIP understand the relationship between images and text. It takes an existing dataset of image-sentence pairs and VLM output scores, then correlates these scores with various linguistic features to identify what the model is actually 'seeing' or 'understanding'. You would use this if you are developing or applying VLMs and need to understand their semantic strengths and weaknesses without extensive manual annotation.

No commits in the last 6 months.

Use this if you want to gain deeper insights into why a vision-language model performs well or poorly on specific image-text combinations by analyzing linguistic patterns.

Not ideal if you are looking for a tool to train new vision-language models or for a general-purpose image classification or captioning application.

AI-evaluation model-interpretability natural-language-processing computer-vision semantic-analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

gabeur/mmt

Multi-Modal Transformer for Video Retrieval

JerryYLi/valhalla-nmt

Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"

SkalskiP/awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples +...

benywon/LALM

code and resource for ACL2021 paper 'Multi-Lingual Question Generation with Language Agnostic...

thunlp/cost-optimal-gqa

The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"

Explore Transformer Models

All categories Trending Transformer directory Insights