kjappelbaum/awesome-chemistry-datasets

overview of datasets for ML in chemistry

49
/ 100
Emerging

Organizes diverse chemistry datasets across text corpora, molecular structures, and property prediction benchmarks, spanning named entity recognition tasks, crystal structure databases, mass spectrometry repositories, and curated ML benchmarks like MPCD and MoleculeACE. The collection aggregates both raw data sources (PubMed, bioRxiv, ZINC libraries) and preprocessed benchmark datasets with standardized evaluation protocols, targeting structure-activity/property prediction and chemical information extraction workflows. Covers experimental measurements (solubility, binding affinity, spectra) alongside synthetic enumerated datasets, enabling multi-modal chemistry ML research across representation types and data scales.

394 stars.

No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

394

Forks

45

Language

License

CC0-1.0

Last pushed

Oct 22, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/kjappelbaum/awesome-chemistry-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.