VLMEvalKit and evaluation-guidebook

VLMEvalKit
72
Verified
evaluation-guidebook
49
Emerging
Maintenance 23/25
Adoption 10/25
Maturity 16/25
Community 23/25
Maintenance 6/25
Adoption 10/25
Maturity 16/25
Community 17/25
Stars: 3,894
Forks: 650
Downloads:
Commits (30d): 21
Language: Python
License: Apache-2.0
Stars: 2,075
Forks: 121
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License:
No Package No Dependents
No Package No Dependents

About VLMEvalKit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Provides generation-based evaluation across all supported models with dual assessment modes—exact matching and LLM-based answer extraction—eliminating manual data preparation across fragmented benchmark repositories. Supports distributed inference via LMDeploy and VLLM for accelerated evaluation of large-scale deployments, with specialized handling for models with reasoning/thinking modes and long-form outputs exceeding standard cell limits. Integrates with Hugging Face ecosystem (model hosting, datasets, spaces for leaderboards) and supports video benchmarks via ModelScope for comprehensive vision-language assessment.

About evaluation-guidebook

huggingface/evaluation-guidebook

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Scores updated daily from GitHub, PyPI, and npm data. How scores work