lmms-eval and VLMEvalKit
These are complementary evaluation frameworks that can be used together, with lmms-eval offering broader modality coverage (text, image, video, audio) while VLMEvalKit provides more extensive model and benchmark support (220+ LMMs, 80+ benchmarks), allowing practitioners to choose or combine them based on their specific evaluation priorities.
About lmms-eval
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
About VLMEvalKit
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Provides generation-based evaluation across all supported models with dual assessment modes—exact matching and LLM-based answer extraction—eliminating manual data preparation across fragmented benchmark repositories. Supports distributed inference via LMDeploy and VLLM for accelerated evaluation of large-scale deployments, with specialized handling for models with reasoning/thinking modes and long-form outputs exceeding standard cell limits. Integrates with Hugging Face ecosystem (model hosting, datasets, spaces for leaderboards) and supports video benchmarks via ModelScope for comprehensive vision-language assessment.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work