TIGER-AI-Lab/VLM2Vec
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
Extends unified multimodal embeddings to videos and visual documents via instruction-guided contrastive training on Qwen2-VL backbones, enabling cross-modal retrieval and classification across diverse visual formats. The framework introduces MMEB-V2, a 78-task benchmark spanning image, video, and document modalities for systematic evaluation. Integrates with Hugging Face for model checkpoints and datasets, and has been integrated into vLLM for production inference.
592 stars.
Stars
592
Forks
51
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/TIGER-AI-Lab/VLM2Vec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.