MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

/ 100

Established

This project provides a rigorous way to test and compare how well advanced AI models can understand and reason across many academic subjects. It takes college-level questions with various image types (like charts and diagrams) as input and outputs the AI's accuracy in answering them. Researchers, AI developers, and academics building or evaluating multimodal AI will find this useful.

548 stars.

Use this if you are developing advanced AI models and need a comprehensive, challenging benchmark to assess their ability to integrate visual and textual information and reason like a human expert.

Not ideal if you are looking for a simple, task-specific dataset for basic image recognition or natural language processing, as this benchmark focuses on complex, multi-disciplinary understanding.

AI-model-evaluation multimodal-AI cognitive-science expert-systems academic-assessment

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

548

Forks

Language

Python

License

Apache-2.0

Related models

ExtensityAI/symbolicai

A neurosymbolic perspective on LLMs

TIGER-AI-Lab/MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...

deep-symbolic-mathematics/LLM-SR

[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...

ise-uiuc/magicoder

[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct

microsoft/interwhen

A framework for verifiable reasoning with language models.

Explore Transformer Models

All categories Trending Transformer directory Insights