allenai/scibert
A BERT model for scientific text.
Pretrained on 1.14M full-text papers (3.1B tokens) from Semantic Scholar with a domain-specific vocabulary optimized for scientific language. Available in TensorFlow and PyTorch formats via Hugging Face's `transformers` library, supporting both custom `scivocab` and standard BERT vocabularies in cased/uncased variants. Achieves state-of-the-art results across scientific NLP tasks including named entity recognition, relation extraction, citation intent classification, and dependency parsing on biomedical and computer science benchmarks.
1,677 stars. No commits in the last 6 months.
Stars
1,677
Forks
233
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 22, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/allenai/scibert"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
fidelity/textwiser
[AAAI 2021] TextWiser: Text Featurization Library
RandolphVI/Multi-Label-Text-Classification
About Muti-Label Text Classification Based on Neural Network.
ThilinaRajapakse/pytorch-transformers-classification
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for...
xuyige/BERT4doc-Classification
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
ncbi-nlp/bluebert
BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).