lmms-eval and VLMEvalKit

These are complementary evaluation frameworks that can be used together, with lmms-eval offering broader modality coverage (text, image, video, audio) while VLMEvalKit provides more extensive model and benchmark support (220+ LMMs, 80+ benchmarks), allowing practitioners to choose or combine them based on their specific evaluation priorities.

lmms-eval
90
Verified
VLMEvalKit
72
Verified
Maintenance 23/25
Adoption 20/25
Maturity 25/25
Community 22/25
Maintenance 23/25
Adoption 10/25
Maturity 16/25
Community 23/25
Stars: 3,883
Forks: 539
Downloads: 9,061
Commits (30d): 30
Language: Python
License:
Stars: 3,894
Forks: 650
Downloads:
Commits (30d): 21
Language: Python
License: Apache-2.0
No risk flags
No Package No Dependents

About lmms-eval

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

About VLMEvalKit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Provides generation-based evaluation across all supported models with dual assessment modes—exact matching and LLM-based answer extraction—eliminating manual data preparation across fragmented benchmark repositories. Supports distributed inference via LMDeploy and VLLM for accelerated evaluation of large-scale deployments, with specialized handling for models with reasoning/thinking modes and long-form outputs exceeding standard cell limits. Integrates with Hugging Face ecosystem (model hosting, datasets, spaces for leaderboards) and supports video benchmarks via ModelScope for comprehensive vision-language assessment.

Scores updated daily from GitHub, PyPI, and npm data. How scores work