chembench and macbench
About chembench
lamalab-org/chembench
How good are LLMs at chemistry?
ChemBench helps chemists and materials scientists evaluate how well large language models (LLMs) and multimodal models perform on chemistry-related tasks. You provide a language model (or a vision-language model) and it outputs detailed reports on the model's accuracy across various chemistry topics. This is for researchers and developers working with AI in chemistry who need to assess model capabilities.
About macbench
lamalab-org/macbench
Probing the limitations of multimodal language models for chemistry and materials research
This tool helps chemistry and materials science researchers evaluate how well advanced AI models (multimodal language models) understand and respond to questions using both text and images in your field. You input a multimodal language model and a set of chemistry/materials research tasks, and it provides a report on the model's performance across various stages of scientific work. This is designed for scientists, engineers, and researchers who want to assess or compare AI models for scientific discovery workflows.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work