MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Comprises 11.5K college-level multimodal questions spanning 6 disciplines and 183 subfields with 32 heterogeneous image types (charts, diagrams, chemical structures, etc.), designed to assess expert-level perception and reasoning. The evaluation framework includes both standard MMMU and MMMU-Pro variants, the latter employing stricter methodologyโfiltering text-only questions, augmenting distractors, and embedding questions as images to test simultaneous vision-and-reading capabilities. Supports systematic evaluation across development, validation, and test sets with local answer verification via JSON-based pipelines, enabling rigorous benchmarking of vision-language models without external submission infrastructure.
548 stars.
Stars
548
Forks
49
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/MMMU-Benchmark/MMMU"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
pat-jj/DeepRetrieval
[COLMโ25] DeepRetrieval โ ๐ฅ Training Search Agent by RLVR with Retrieval Outcome
lupantech/MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
ise-uiuc/magicoder
[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct
x66ccff/liveideabench
[๐๐๐ญ๐ฎ๐ซ๐ ๐๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐๐๐ญ๐ข๐จ๐ง๐ฌ] ๐ค๐ก LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea...
IAAR-Shanghai/xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations