allenai/scibert

A BERT model for scientific text.

/ 100

Emerging

Pretrained on 1.14M full-text papers (3.1B tokens) from Semantic Scholar with a domain-specific vocabulary optimized for scientific language. Available in TensorFlow and PyTorch formats via Hugging Face's `transformers` library, supporting both custom `scivocab` and standard BERT vocabularies in cased/uncased variants. Achieves state-of-the-art results across scientific NLP tasks including named entity recognition, relation extraction, citation intent classification, and dependency parsing on biomedical and computer science benchmarks.

1,677 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

1,677

Forks

233

Language

Python

License

Apache-2.0

Higher-rated alternatives

fidelity/textwiser

[AAAI 2021] TextWiser: Text Featurization Library

RandolphVI/Multi-Label-Text-Classification

About Muti-Label Text Classification Based on Neural Network.

ThilinaRajapakse/pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for...

xuyige/BERT4doc-Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

ncbi-nlp/bluebert

BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).

Explore NLP Tools

All categories Trending NLP directory Insights