open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

/ 100

Verified

Provides flexible evaluation pipelines through composable evaluators (including LLM-as-judge and mathematical reasoning assessments) and supports specialized benchmarks for long-context, reasoning, and scientific tasks. Features configurable model backends (HuggingFace, vLLM, LMDeploy) with answer post-processing via models like XFinder for more accurate capability assessment. Integrates with ModelScope for on-demand dataset loading and includes CompassHub and CompassRank for centralized benchmark results and model ranking.

6,752 stars. Actively maintained with 11 commits in the last 30 days. Available on PyPI.

Maintenance 20 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 21 / 25

How are scores calculated?

Stars

6,752

Forks

743

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Compare

opencompass and COMPASS

Related tools

IBM/unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...

lean-dojo/LeanDojo

Tool for data extraction and interacting with Lean programmatically.

salesforce/CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation

GoodStartLabs/AI_Diplomacy

Frontier Models playing the board game Diplomacy.

MigoXLab/LMeterX

A general-purpose API load testing platform that supports LLM services and business HTTP...

Explore LLM Tools

All categories Trending LLM Tool directory Insights