Chemistry LLM Benchmarks LLM Tools

Tools, datasets, and benchmarks for evaluating and fine-tuning large language models on chemistry and molecular property prediction tasks. Does NOT include general scientific LLM frameworks, materials science benchmarks, or chemistry software without LLM components.

There are 21 chemistry llm benchmarks tools tracked. The highest-rated is maxischuh/TwinBooster at 47/100 with 6 stars and 103 monthly downloads.

Get all 21 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=chemistry-llm-benchmarks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	maxischuh/TwinBooster Package for TwinBooster. Enables fast and powerful zero-shot molecular...	47	Emerging	6	Python
2	theochem/ModelHamiltonian Generate 1- and 2-electron integrals so that molecular quantum chemistry...	47	Emerging	55	Python
3	lamalab-org/chembench How good are LLMs at chemistry?	43	Emerging	134	Python
4	pnnl/cactus LLM Agent that leverages cheminformatics tools to provide informed responses.	37	Emerging	48	Jupyter Notebook
5	jan-janssen/LangSim Application of Large Language Models (LLM) for computational materials...	36	Emerging	84	Jupyter Notebook
6	MasterAI-EAM/Darwin An open-source project dedicated to build foundational large language model...	34	Emerging	247	Jupyter Notebook
7	andresilvapimentel/AI4Chem AI4Chem is a code to test the ability of large language models (ChatGPT) to...	33	Emerging	23	Jupyter Notebook
8	lamalab-org/chemlift Language-interfaced fine-tuning for chemistry	31	Emerging	45	Jupyter Notebook
9	lamalab-org/macbench Probing the limitations of multimodal language models for chemistry and...	29	Experimental	23	Python
10	jschrier/SynthGPT Code and Data for "Large Language Models for Inorganic Synthesis Prediction"	27	Experimental	33	Python
11	lamalab-org/chem-bench-app Frontend for evaluating humans on chemistry questions	26	Experimental	11	TypeScript
12	google/task-oriented-queries Task-oriented queries (e.g., one-shot queries to play videos, order food, or...	25	Experimental	5	—
13	chemkg/c3p LLM-generated CHEBI classifiers	23	Experimental	13	Python
14	ai4cat/AI4C-LitMiner Developed for AI-driven catalyst discovery, integrating LLM-based knowledge...	22	Experimental	—	Python
15	Eljefaso2949/QuantumChem-200K 🧬 Discover and utilize QuantumChem-200K, a dataset of 200,000 organic...	22	Experimental	—	Jupyter Notebook
16	renjieli08/QuantumChem-200K QuantumChem-200K: A Large-Scale Open Organic Molecular Dataset for...	21	Experimental	2	Jupyter Notebook
17	ChemFoundationModels/ChemLLMBench Official Code for What can Large Language Models do in chemistry? A...	19	Experimental	170	Jupyter Notebook
18	jschrier/KRICT_hackathon_phosphors KRICT ChemDX Hackathon project: Inorganic Phosphors	12	Experimental	4	Mathematica
19	ehrenhofer-group/LLM_Material_Property_Benchmark A Python toolkit for evaluating Large Language Models (LLMs) in materials...	11	Experimental	—	Python
20	drakedu/formalize FORMALIZE is a lightweight framework that improves LLM-based program...	11	Experimental	—	Python
21	apekshyasharma/AAII_Intelligence_Idex_Analysis A data-driven benchmarking analysis of leading Artificial Intelligence...	11	Experimental	—	Jupyter Notebook