VLMEvalKit and SciEvalKit

These two tools are competitors, with VLMEvalKit offering broader LMM support and benchmarks, while SciEvalKit provides a specialized evaluation toolkit and leaderboard focused on scientific intelligence across the full research workflow.

VLMEvalKit

Verified

SciEvalKit

Emerging

Maintenance 23/25

Adoption 10/25

Maturity 16/25

Community 23/25

Maintenance 10/25

Adoption 9/25

Maturity 13/25

Community 14/25

Stars: 3,894

Forks: 650

Downloads: —

Commits (30d): 21

Language: Python

License: Apache-2.0

Stars: 74

Forks: 10

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No Package No Dependents

About VLMEvalKit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Provides generation-based evaluation across all supported models with dual assessment modes—exact matching and LLM-based answer extraction—eliminating manual data preparation across fragmented benchmark repositories. Supports distributed inference via LMDeploy and VLLM for accelerated evaluation of large-scale deployments, with specialized handling for models with reasoning/thinking modes and long-form outputs exceeding standard cell limits. Integrates with Hugging Face ecosystem (model hosting, datasets, spaces for leaderboards) and supports video benchmarks via ModelScope for comprehensive vision-language assessment.

About SciEvalKit

InternScience/SciEvalKit

A unified evaluation toolkit and leaderboard for rigorously assessing the scientific intelligence of large language and vision–language models across the full research workflow.

Related comparisons

VLMEvalKit and lmms-eval VLMEvalKit and evalplus VLMEvalKit and evaluation-guidebook

Scores updated daily from GitHub, PyPI, and npm data. How scores work