open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

76
/ 100
Verified

Provides flexible evaluation pipelines through composable evaluators (including LLM-as-judge and mathematical reasoning assessments) and supports specialized benchmarks for long-context, reasoning, and scientific tasks. Features configurable model backends (HuggingFace, vLLM, LMDeploy) with answer post-processing via models like XFinder for more accurate capability assessment. Integrates with ModelScope for on-demand dataset loading and includes CompassHub and CompassRank for centralized benchmark results and model ranking.

6,752 stars. Actively maintained with 11 commits in the last 30 days. Available on PyPI.

Maintenance 20 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 21 / 25

How are scores calculated?

Stars

6,752

Forks

743

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

11

Dependencies

49

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/open-compass/opencompass"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.