allenai/scispacy

A full spaCy pipeline and models for scientific/biomedical documents.

74
/ 100
Verified

Extends spaCy with domain-specific components including a custom tokenizer optimized for scientific terminology, POS tagging and dependency parsing trained on biomedical corpora, and an abbreviation detector implementing the Schwartz-Hearst algorithm. Offers multiple model variants from lightweight (100k vocabulary) to transformer-based options using SciBERT, plus specialized NER models trained on biomedical datasets (CRAFT, JNLPBA, BC5CDR). Includes an entity linker component for mapping recognized entities to knowledge bases like UMLS, MeSH, RxNorm, and Gene Ontology.

1,934 stars and 55,780 monthly downloads. Used by 2 other packages. Available on PyPI.

Maintenance 6 / 25
Adoption 22 / 25
Maturity 25 / 25
Community 21 / 25

How are scores calculated?

Stars

1,934

Forks

249

Language

Python

License

Apache-2.0

Last pushed

Dec 04, 2025

Monthly downloads

55,780

Commits (30d)

0

Dependencies

10

Reverse dependents

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/allenai/scispacy"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.