igorbrigadir/stopwords

Default English stopword lists from many different sources

36
/ 100
Emerging

Aggregates 40+ stopword lists sourced from search engines (Sphinx, Lucene), databases (MySQL, PostgreSQL), NLP libraries (CoreNLP, Stanford), and specialized domains (medical literature, patent search), enabling developers to select domain-appropriate filtering strategies rather than relying on a single generic list. Each list is curated from its original source implementation, preserving domain-specific variations—for example, medical databases include specialized terminology while full-text search engines emphasize query optimization. The repository provides structured, version-controlled access to these fragmented standards, useful for text preprocessing pipelines in search, information retrieval, and NLP applications.

313 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 1 / 25
Community 25 / 25

How are scores calculated?

Stars

313

Forks

125

Language

Python

License

Last pushed

Apr 06, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/igorbrigadir/stopwords"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.