MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

52
/ 100
Established

Comprises 11.5K college-level multimodal questions spanning 6 disciplines and 183 subfields with 32 heterogeneous image types (charts, diagrams, chemical structures, etc.), designed to assess expert-level perception and reasoning. The evaluation framework includes both standard MMMU and MMMU-Pro variants, the latter employing stricter methodologyโ€”filtering text-only questions, augmenting distractors, and embedding questions as images to test simultaneous vision-and-reading capabilities. Supports systematic evaluation across development, validation, and test sets with local answer verification via JSON-based pipelines, enabling rigorous benchmarking of vision-language models without external submission infrastructure.

548 stars.

No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

548

Forks

49

Language

Python

License

Apache-2.0

Last pushed

Feb 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/MMMU-Benchmark/MMMU"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.