TIGER-AI-Lab/VLM2Vec

This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]

48
/ 100
Emerging

Extends unified multimodal embeddings to videos and visual documents via instruction-guided contrastive training on Qwen2-VL backbones, enabling cross-modal retrieval and classification across diverse visual formats. The framework introduces MMEB-V2, a 78-task benchmark spanning image, video, and document modalities for systematic evaluation. Integrates with Hugging Face for model checkpoints and datasets, and has been integrated into vLLM for production inference.

592 stars.

No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 9 / 25
Community 16 / 25

How are scores calculated?

Stars

592

Forks

51

Language

Python

License

Apache-2.0

Last pushed

Mar 09, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/TIGER-AI-Lab/VLM2Vec"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.