lmms-eval and VLMEvalKit

These are complementary evaluation frameworks that can be used together, with lmms-eval offering broader modality coverage (text, image, video, audio) while VLMEvalKit provides more extensive model and benchmark support (220+ LMMs, 80+ benchmarks), allowing practitioners to choose or combine them based on their specific evaluation priorities.

lmms-eval

Verified

VLMEvalKit

Verified

Maintenance 23/25

Adoption 20/25

Maturity 25/25

Community 22/25

Maintenance 23/25

Adoption 10/25

Maturity 16/25

Community 23/25

Stars: 3,883

Forks: 539

Downloads: 9,061

Commits (30d): 30

Language: Python

License: —

Stars: 3,894

Forks: 650

Downloads: —

Commits (30d): 21

Language: Python

License: Apache-2.0

No risk flags

No Package No Dependents

About lmms-eval

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

About VLMEvalKit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Provides generation-based evaluation across all supported models with dual assessment modes—exact matching and LLM-based answer extraction—eliminating manual data preparation across fragmented benchmark repositories. Supports distributed inference via LMDeploy and VLLM for accelerated evaluation of large-scale deployments, with specialized handling for models with reasoning/thinking modes and long-form outputs exceeding standard cell limits. Integrates with Hugging Face ecosystem (model hosting, datasets, spaces for leaderboards) and supports video benchmarks via ModelScope for comprehensive vision-language assessment.

Related comparisons

lmms-eval and MASEval lmms-eval and evaluation-guidebook lmms-eval and evalplus lmms-eval and SciEvalKit lmms-eval and evaluation-guidebook

Scores updated daily from GitHub, PyPI, and npm data. How scores work