HUBioDataLab/SELFormer
SELFormer: Molecular Representation Learning via SELFIES Language Models
Implements a RoBERTa-based transformer encoder pre-trained on masked language modeling with SELFIES notation—a 100% chemically valid alternative to SMILES—enabling robust molecular representation learning. Provides pre-trained models, fine-tuning capabilities for property prediction tasks (classification/regression), and downloadable molecular embeddings from ChEMBL and MoleculeNet datasets. Outperforms graph-based and SMILES-language approaches on solubility and toxicity prediction benchmarks.
107 stars. No commits in the last 6 months.
Stars
107
Forks
19
Language
Python
License
—
Category
Last pushed
Dec 01, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/HUBioDataLab/SELFormer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
rxn4chemistry/rxn-onmt-models
Training of OpenNMT-based RXN models
CTCycle/ADSMOD-Adsorption-Modeling
Streamline adsorption modeling by automatically fitting theoretical adsorption models to...
lamalab-org/MatText
Text-based modeling of materials.
VectorInstitute/atomgen
Library for handling atomistic graph datasets focusing on transformer-based implementations,...
sanjaradylov/smiles-gpt
Generative Pre-Training from Molecules