open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Provides flexible evaluation pipelines through composable evaluators (including LLM-as-judge and mathematical reasoning assessments) and supports specialized benchmarks for long-context, reasoning, and scientific tasks. Features configurable model backends (HuggingFace, vLLM, LMDeploy) with answer post-processing via models like XFinder for more accurate capability assessment. Integrates with ModelScope for on-demand dataset loading and includes CompassHub and CompassRank for centralized benchmark results and model ranking.
6,752 stars. Actively maintained with 11 commits in the last 30 days. Available on PyPI.
Stars
6,752
Forks
743
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
11
Dependencies
49
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/open-compass/opencompass"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Compare
Related tools
IBM/unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...
lean-dojo/LeanDojo
Tool for data extraction and interacting with Lean programmatically.
salesforce/CodeT5
Home of CodeT5: Open Code LLMs for Code Understanding and Generation
GoodStartLabs/AI_Diplomacy
Frontier Models playing the board game Diplomacy.
MigoXLab/LMeterX
A general-purpose API load testing platform that supports LLM services and business HTTP...