chembench and ChemLLMBench
About chembench
lamalab-org/chembench
How good are LLMs at chemistry?
ChemBench helps chemists and materials scientists evaluate how well large language models (LLMs) and multimodal models perform on chemistry-related tasks. You provide a language model (or a vision-language model) and it outputs detailed reports on the model's accuracy across various chemistry topics. This is for researchers and developers working with AI in chemistry who need to assess model capabilities.
About ChemLLMBench
ChemFoundationModels/ChemLLMBench
Official Code for What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks (In NeurIPS 2023)
This project helps chemists and materials scientists evaluate how well large language models (LLMs) perform on various chemistry-related tasks. It takes chemical data, reaction descriptions, or molecular properties as input and uses different LLMs to predict outcomes like reaction products, retrosynthesis pathways, or molecular properties. The output helps researchers understand the strengths and weaknesses of LLMs for specific chemical challenges.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work