allenai/scispacy
A full spaCy pipeline and models for scientific/biomedical documents.
Extends spaCy with domain-specific components including a custom tokenizer optimized for scientific terminology, POS tagging and dependency parsing trained on biomedical corpora, and an abbreviation detector implementing the Schwartz-Hearst algorithm. Offers multiple model variants from lightweight (100k vocabulary) to transformer-based options using SciBERT, plus specialized NER models trained on biomedical datasets (CRAFT, JNLPBA, BC5CDR). Includes an entity linker component for mapping recognized entities to knowledge bases like UMLS, MeSH, RxNorm, and Gene Ontology.
1,934 stars and 55,780 monthly downloads. Used by 2 other packages. Available on PyPI.
Stars
1,934
Forks
249
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 04, 2025
Monthly downloads
55,780
Commits (30d)
0
Dependencies
10
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/allenai/scispacy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called...
sloria/TextBlob
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase...
cltk/cltk
The Classical Language Toolkit
delph-in/pydelphin
Python libraries for DELPH-IN
SamEdwardes/spacytextblob
A TextBlob sentiment analysis pipeline component for spaCy.