allenai/scispacy

A full spaCy pipeline and models for scientific/biomedical documents.

/ 100

Verified

Extends spaCy with domain-specific components including a custom tokenizer optimized for scientific terminology, POS tagging and dependency parsing trained on biomedical corpora, and an abbreviation detector implementing the Schwartz-Hearst algorithm. Offers multiple model variants from lightweight (100k vocabulary) to transformer-based options using SciBERT, plus specialized NER models trained on biomedical datasets (CRAFT, JNLPBA, BC5CDR). Includes an entity linker component for mapping recognized entities to knowledge bases like UMLS, MeSH, RxNorm, and Gene Ontology.

1,934 stars and 55,780 monthly downloads. Used by 2 other packages. Available on PyPI.

Maintenance 6 / 25

Adoption 22 / 25

Maturity 25 / 25

Community 21 / 25

How are scores calculated?

Stars

1,934

Forks

249

Language

Python

License

Apache-2.0

Related tools

chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called...

sloria/TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase...

cltk/cltk

The Classical Language Toolkit

delph-in/pydelphin

Python libraries for DELPH-IN

SamEdwardes/spacytextblob

A TextBlob sentiment analysis pipeline component for spaCy.

Explore NLP Tools

All categories Trending NLP directory Insights