lmms-eval and evaluation-guidebook

The comprehensive multimodal evaluation toolkit (A) and the LLM evaluation guidebook (B) are complementary, with (A) providing the practical implementation for a broad range of multimodal tasks and (B) offering theoretical knowledge and insights specifically for large language model evaluation, which could inform the use and interpretation of tool (A) for text-based tasks.

lmms-eval

Verified

evaluation-guidebook

Emerging

Maintenance 23/25

Adoption 20/25

Maturity 25/25

Community 22/25

Maintenance 6/25

Adoption 10/25

Maturity 16/25

Community 17/25

Stars: 3,883

Forks: 539

Downloads: 9,061

Commits (30d): 30

Language: Python

License: —

Stars: 2,075

Forks: 121

Downloads: —

Commits (30d): 0

Language: Jupyter Notebook

License: —

No risk flags

No Package No Dependents

About lmms-eval

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

About evaluation-guidebook

huggingface/evaluation-guidebook

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Related comparisons

lmms-eval and VLMEvalKit lmms-eval and MASEval lmms-eval and VLMEvalKit

Scores updated daily from GitHub, PyPI, and npm data. How scores work