VLMEvalKit and SciEvalKit

These two tools are competitors, with VLMEvalKit offering broader LMM support and benchmarks, while SciEvalKit provides a specialized evaluation toolkit and leaderboard focused on scientific intelligence across the full research workflow.

VLMEvalKit
72
Verified
SciEvalKit
46
Emerging
Maintenance 23/25
Adoption 10/25
Maturity 16/25
Community 23/25
Maintenance 10/25
Adoption 9/25
Maturity 13/25
Community 14/25
Stars: 3,894
Forks: 650
Downloads:
Commits (30d): 21
Language: Python
License: Apache-2.0
Stars: 74
Forks: 10
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No Package No Dependents
No Package No Dependents

About VLMEvalKit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Provides generation-based evaluation across all supported models with dual assessment modes—exact matching and LLM-based answer extraction—eliminating manual data preparation across fragmented benchmark repositories. Supports distributed inference via LMDeploy and VLLM for accelerated evaluation of large-scale deployments, with specialized handling for models with reasoning/thinking modes and long-form outputs exceeding standard cell limits. Integrates with Hugging Face ecosystem (model hosting, datasets, spaces for leaderboards) and supports video benchmarks via ModelScope for comprehensive vision-language assessment.

About SciEvalKit

InternScience/SciEvalKit

A unified evaluation toolkit and leaderboard for rigorously assessing the scientific intelligence of large language and vision–language models across the full research workflow.

Scores updated daily from GitHub, PyPI, and npm data. How scores work