MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
This project provides a rigorous way to test and compare how well advanced AI models can understand and reason across many academic subjects. It takes college-level questions with various image types (like charts and diagrams) as input and outputs the AI's accuracy in answering them. Researchers, AI developers, and academics building or evaluating multimodal AI will find this useful.
548 stars.
Use this if you are developing advanced AI models and need a comprehensive, challenging benchmark to assess their ability to integrate visual and textual information and reason like a human expert.
Not ideal if you are looking for a simple, task-specific dataset for basic image recognition or natural language processing, as this benchmark focuses on complex, multi-disciplinary understanding.
Stars
548
Forks
49
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/MMMU-Benchmark/MMMU"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
ExtensityAI/symbolicai
A neurosymbolic perspective on LLMs
TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...
deep-symbolic-mathematics/LLM-SR
[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...
ise-uiuc/magicoder
[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct
microsoft/interwhen
A framework for verifiable reasoning with language models.