chartbeat-labs/textacy

NLP, before and after spaCy

/ 100

Verified

Provides preprocessing utilities (text cleaning, normalization) and postprocessing capabilities like n-gram extraction, entity linking, keyterm identification, and topic modeling on spaCy-processed documents. Includes built-in datasets with text and metadata, string similarity metrics, and linguistic statistics (readability scores, lexical diversity measures). Extends spaCy's API through convenience methods and custom extensions for streamlined multi-document workflows.

2,236 stars and 75,599 monthly downloads. Used by 4 other packages. No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 24 / 25

Maturity 25 / 25

Community 21 / 25

How are scores calculated?

Stars

2,236

Forks

249

Language

Python

License

—

Category

text-preprocessing-pipelines

Last pushed

Sep 22, 2023

Monthly downloads

75,599

Commits (30d)

Dependencies

Reverse dependents

GitHub PyPI

Text Preprocessing Pipelines · 45 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/chartbeat-labs/textacy"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related tools

nltk/nltk_data

NLTK Data

prasanthg3/cleantext

An open-source package for python to clean raw text data

brightertiger/pygarble

Python Package to detect garbled, gibberish text for EN

jfilter/clean-text

🧹 Python package for text cleaning

citiususc/pyplexity

Cleaning tool for web scraped text

Explore NLP Tools

All categories Trending NLP directory Insights