lmms-eval and LLMEvaluation
About lmms-eval
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
This tool helps researchers and AI practitioners reliably compare how well different multimodal AI models understand and respond to various types of real-world information. You provide an AI model and a set of diverse tasks involving text, images, video, and audio, and it outputs consistent, trustworthy performance metrics. Anyone who builds, deploys, or studies large multimodal models will find this useful for understanding model capabilities.
About LLMEvaluation
alopatenko/LLMEvaluation
A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
This compendium helps academics and industry professionals effectively evaluate Large Language Models (LLMs) and their applications. It takes in various LLM models or systems and outputs a comprehensive understanding of their performance, limitations, and suitability for specific tasks. Anyone responsible for deploying or assessing AI models in their organization, such as AI product managers, research scientists, or data scientists, would find this useful.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work