MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

/ 100

Established

Comprises 11.5K college-level multimodal questions spanning 6 disciplines and 183 subfields with 32 heterogeneous image types (charts, diagrams, chemical structures, etc.), designed to assess expert-level perception and reasoning. The evaluation framework includes both standard MMMU and MMMU-Pro variants, the latter employing stricter methodology—filtering text-only questions, augmenting distractors, and embedding questions as images to test simultaneous vision-and-reading capabilities. Supports systematic evaluation across development, validation, and test sets with local answer verification via JSON-based pipelines, enabling rigorous benchmarking of vision-language models without external submission infrastructure.

548 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

548

Forks

Language

Python

License

Apache-2.0

Related tools

pat-jj/DeepRetrieval

[COLM’25] DeepRetrieval — 🔥 Training Search Agent by RLVR with Retrieval Outcome

lupantech/MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

ise-uiuc/magicoder

[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct

x66ccff/liveideabench

[𝐍𝐚𝐭𝐮𝐫𝐞 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬] 🤖💡 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea...

IAAR-Shanghai/xVerify

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Explore LLM Tools

All categories Trending LLM Tool directory Insights