kjappelbaum/awesome-chemistry-datasets
overview of datasets for ML in chemistry
Organizes diverse chemistry datasets across text corpora, molecular structures, and property prediction benchmarks, spanning named entity recognition tasks, crystal structure databases, mass spectrometry repositories, and curated ML benchmarks like MPCD and MoleculeACE. The collection aggregates both raw data sources (PubMed, bioRxiv, ZINC libraries) and preprocessed benchmark datasets with standardized evaluation protocols, targeting structure-activity/property prediction and chemical information extraction workflows. Covers experimental measurements (solubility, binding affinity, spectra) alongside synthetic enumerated datasets, enabling multi-modal chemistry ML research across representation types and data scales.
394 stars.
Stars
394
Forks
45
Language
—
License
CC0-1.0
Category
Last pushed
Oct 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/kjappelbaum/awesome-chemistry-datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
josiehong/awesome-smallmol-massspec-ml
Awesome papers and codes list of small molecule mass spectrometry-related machine learning methods
inoue0426/awesome-computational-biology
Awesome list of computational biology.
GoekeLab/awesome-nanopore
A curated list of awesome nanopore analysis tools.
HongxinXiang/awesome-ai-bioinformatics
A curated list of awesome AI and Bioinformatics.
benb111/awesome-small-molecule-ml
A curated list of resources for machine learning for small-molecule drug discovery