Chemistry LLM Benchmarks LLM Tools
Tools, datasets, and benchmarks for evaluating and fine-tuning large language models on chemistry and molecular property prediction tasks. Does NOT include general scientific LLM frameworks, materials science benchmarks, or chemistry software without LLM components.
There are 21 chemistry llm benchmarks tools tracked. The highest-rated is maxischuh/TwinBooster at 47/100 with 6 stars and 103 monthly downloads.
Get all 21 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=chemistry-llm-benchmarks&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
maxischuh/TwinBooster
Package for TwinBooster. Enables fast and powerful zero-shot molecular... |
|
Emerging |
| 2 |
theochem/ModelHamiltonian
Generate 1- and 2-electron integrals so that molecular quantum chemistry... |
|
Emerging |
| 3 |
lamalab-org/chembench
How good are LLMs at chemistry? |
|
Emerging |
| 4 |
pnnl/cactus
LLM Agent that leverages cheminformatics tools to provide informed responses. |
|
Emerging |
| 5 |
jan-janssen/LangSim
Application of Large Language Models (LLM) for computational materials... |
|
Emerging |
| 6 |
MasterAI-EAM/Darwin
An open-source project dedicated to build foundational large language model... |
|
Emerging |
| 7 |
andresilvapimentel/AI4Chem
AI4Chem is a code to test the ability of large language models (ChatGPT) to... |
|
Emerging |
| 8 |
lamalab-org/chemlift
Language-interfaced fine-tuning for chemistry |
|
Emerging |
| 9 |
lamalab-org/macbench
Probing the limitations of multimodal language models for chemistry and... |
|
Experimental |
| 10 |
jschrier/SynthGPT
Code and Data for "Large Language Models for Inorganic Synthesis Prediction" |
|
Experimental |
| 11 |
lamalab-org/chem-bench-app
Frontend for evaluating humans on chemistry questions |
|
Experimental |
| 12 |
google/task-oriented-queries
Task-oriented queries (e.g., one-shot queries to play videos, order food, or... |
|
Experimental |
| 13 |
chemkg/c3p
LLM-generated CHEBI classifiers |
|
Experimental |
| 14 |
ai4cat/AI4C-LitMiner
Developed for AI-driven catalyst discovery, integrating LLM-based knowledge... |
|
Experimental |
| 15 |
Eljefaso2949/QuantumChem-200K
🧬 Discover and utilize QuantumChem-200K, a dataset of 200,000 organic... |
|
Experimental |
| 16 |
renjieli08/QuantumChem-200K
QuantumChem-200K: A Large-Scale Open Organic Molecular Dataset for... |
|
Experimental |
| 17 |
ChemFoundationModels/ChemLLMBench
Official Code for What can Large Language Models do in chemistry? A... |
|
Experimental |
| 18 |
jschrier/KRICT_hackathon_phosphors
KRICT ChemDX Hackathon project: Inorganic Phosphors |
|
Experimental |
| 19 |
ehrenhofer-group/LLM_Material_Property_Benchmark
A Python toolkit for evaluating Large Language Models (LLMs) in materials... |
|
Experimental |
| 20 |
drakedu/formalize
FORMALIZE is a lightweight framework that improves LLM-based program... |
|
Experimental |
| 21 |
apekshyasharma/AAII_Intelligence_Idex_Analysis
A data-driven benchmarking analysis of leading Artificial Intelligence... |
|
Experimental |