VLMEvalKit and evaluation-guidebook

VLMEvalKit

Verified

evaluation-guidebook

Emerging

Maintenance 23/25

Adoption 10/25

Maturity 16/25

Community 23/25

Maintenance 6/25

Adoption 10/25

Maturity 16/25

Community 17/25

Stars: 3,894

Forks: 650

Downloads: —

Commits (30d): 21

Language: Python

License: Apache-2.0

Stars: 2,075

Forks: 121

Downloads: —

Commits (30d): 0

Language: Jupyter Notebook

License: —

No Package No Dependents

About VLMEvalKit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Provides generation-based evaluation across all supported models with dual assessment modes—exact matching and LLM-based answer extraction—eliminating manual data preparation across fragmented benchmark repositories. Supports distributed inference via LMDeploy and VLLM for accelerated evaluation of large-scale deployments, with specialized handling for models with reasoning/thinking modes and long-form outputs exceeding standard cell limits. Integrates with Hugging Face ecosystem (model hosting, datasets, spaces for leaderboards) and supports video benchmarks via ModelScope for comprehensive vision-language assessment.

About evaluation-guidebook

huggingface/evaluation-guidebook

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Related comparisons

VLMEvalKit and lmms-eval VLMEvalKit and evalplus VLMEvalKit and SciEvalKit VLMEvalKit and lmms-eval

Scores updated daily from GitHub, PyPI, and npm data. How scores work