igorbrigadir/stopwords
Default English stopword lists from many different sources
Aggregates 40+ stopword lists sourced from search engines (Sphinx, Lucene), databases (MySQL, PostgreSQL), NLP libraries (CoreNLP, Stanford), and specialized domains (medical literature, patent search), enabling developers to select domain-appropriate filtering strategies rather than relying on a single generic list. Each list is curated from its original source implementation, preserving domain-specific variations—for example, medical databases include specialized terminology while full-text search engines emphasize query optimization. The repository provides structured, version-controlled access to these fragmented standards, useful for text preprocessing pipelines in search, information retrieval, and NLP applications.
313 stars. No commits in the last 6 months.
Stars
313
Forks
125
Language
Python
License
—
Category
Last pushed
Apr 06, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/igorbrigadir/stopwords"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Alir3z4/python-stop-words
Get list of common stop words in various languages in Python
hklemp/dotnet-stop-words
Get list of common stop words in various languages in dotnet
eklem/stopword-trainer
A module for creating stopword lists for any language, based on a set of documents.
skupriienko/Ukrainian-Stopwords
the list of ~2000 ukrainian stopwords (with numbers)
stdlib-js/datasets-savoy-stopwords-fr
A list of French stop words.