chartbeat-labs/textacy
NLP, before and after spaCy
Provides preprocessing utilities (text cleaning, normalization) and postprocessing capabilities like n-gram extraction, entity linking, keyterm identification, and topic modeling on spaCy-processed documents. Includes built-in datasets with text and metadata, string similarity metrics, and linguistic statistics (readability scores, lexical diversity measures). Extends spaCy's API through convenience methods and custom extensions for streamlined multi-document workflows.
2,236 stars and 75,599 monthly downloads. Used by 4 other packages. No commits in the last 6 months. Available on PyPI.
Stars
2,236
Forks
249
Language
Python
License
—
Category
Last pushed
Sep 22, 2023
Monthly downloads
75,599
Commits (30d)
0
Dependencies
14
Reverse dependents
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/chartbeat-labs/textacy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
nltk/nltk_data
NLTK Data
prasanthg3/cleantext
An open-source package for python to clean raw text data
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN
jfilter/clean-text
🧹 Python package for text cleaning
citiususc/pyplexity
Cleaning tool for web scraped text