nltk/nltk_data
NLTK Data
Hosts curated corpora, pre-trained models, and linguistic resources (tokenizers, parsers, taggers) that integrate with NLTK's Python NLP framework via automated downloader. Uses an automatically-rebuilt `index.xml` manifest for package distribution and metadata. Emphasizes licensing transparency with detailed per-dataset license documentation to support compliance and responsible use across diverse third-party datasets.
1,795 stars.
Stars
1,795
Forks
1,096
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/nltk/nltk_data"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
chartbeat-labs/textacy
NLP, before and after spaCy
prasanthg3/cleantext
An open-source package for python to clean raw text data
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN
jfilter/clean-text
🧹 Python package for text cleaning
citiususc/pyplexity
Cleaning tool for web scraped text