Stopword Lists Datasets NLP Tools
Collections of stopword lists and datasets for removing common words across languages. Includes pre-compiled stopword collections, language-specific stopword resources, and tools for generating stopword lists. Does NOT include general text preprocessing frameworks, stemming/lemmatization tools, or broader NLP preprocessing pipelines.
There are 34 stopword lists datasets tools tracked. 1 score above 50 (established tier). The highest-rated is Alir3z4/python-stop-words at 68/100 with 159 stars and 237,397 monthly downloads.
Get all 34 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=stopword-lists-datasets&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
Alir3z4/python-stop-words
Get list of common stop words in various languages in Python |
|
Established |
| 2 |
hklemp/dotnet-stop-words
Get list of common stop words in various languages in dotnet |
|
Emerging |
| 3 |
eklem/stopword-trainer
A module for creating stopword lists for any language, based on a set of documents. |
|
Emerging |
| 4 |
igorbrigadir/stopwords
Default English stopword lists from many different sources |
|
Emerging |
| 5 |
skupriienko/Ukrainian-Stopwords
the list of ~2000 ukrainian stopwords (with numbers) |
|
Emerging |
| 6 |
stdlib-js/datasets-savoy-stopwords-fr
A list of French stop words. |
|
Emerging |
| 7 |
stdlib-js/datasets-cmudict
The Carnegie Mellon Pronouncing Dictionary (CMUdict). |
|
Emerging |
| 8 |
skupriienko/Ukrainian-Sentiment-Analysis
The list of Ukrainian words for sentiment analysis and NLP |
|
Emerging |
| 9 |
egorsmkv/ukrainian-accentor
Add accents to words in the Ukrainian language |
|
Experimental |
| 10 |
Sashank222222/massive-english-word-list
π Explore a comprehensive English word list with over 68,000 entries,... |
|
Experimental |
| 11 |
pharo-ai/stopwords
Load the stopwords that you need in Pharo |
|
Experimental |
| 12 |
stdlib-js/datasets-savoy-stopwords-por
A list of Portuguese stop words. |
|
Experimental |
| 13 |
stdlib-js/datasets-liu-positive-opinion-words-en
A list of positive opinion words. |
|
Experimental |
| 14 |
stdlib-js/datasets-stopwords-en
A list of English stop words. |
|
Experimental |
| 15 |
stdlib-js/datasets-savoy-stopwords-sp
A list of Spanish stop words. |
|
Experimental |
| 16 |
stdlib-js/datasets-savoy-stopwords-it
A list of Italian stop words. |
|
Experimental |
| 17 |
contactsunny/RemoveStopWordsInJavaPOC
This is a simple Spring Boot project which removes stop words from a text file. |
|
Experimental |
| 18 |
aeleraqi/arabic-stopwords
This repository contains a comprehensive list of Arabic stopwords. |
|
Experimental |
| 19 |
latincy/verba
verba.txt - A Latin word list in the style of Unix /usr/share/dict/words |
|
Experimental |
| 20 |
Rayraegah/adjectives
A data dump of all adjectives in English language |
|
Experimental |
| 21 |
Helsinki-NLP/UkrainianLT
A collection of links to Ukrainian language tools |
|
Experimental |
| 22 |
stdlib-js/datasets-liu-negative-opinion-words-en
A list of negative opinion words. |
|
Experimental |
| 23 |
stdlib-js/datasets-savoy-stopwords-ger
A list German stop words. |
|
Experimental |
| 24 |
kavgan/stop-words
Stop word lists |
|
Experimental |
| 25 |
vikasing/news-stopwords
A huge list of stopwords collected from millions of news articles |
|
Experimental |
| 26 |
Vidito/norstop
Norstop is a lightweight, zero-dependency Python library to remove Norwegian... |
|
Experimental |
| 27 |
lang-uk/ukrainian-word-stress-dictionary
Dictionary of word stresses in the Ukrainian language πΊπ¦ |
|
Experimental |
| 28 |
bimarakajati/Javanese-and-Sundanese-Stopwords
This project aims to provide stopwords for the Javanese and Sundanese... |
|
Experimental |
| 29 |
Theodotus1243/ukrainian-accentor-transformer
Add accents to words in the Ukrainian language |
|
Experimental |
| 30 |
ynsrc/german-categorized-wordlist
German Categorized Wordlist Project |
|
Experimental |
| 31 |
raccoon-hero/uk-dictionary
A paradigm-based morphological dictionary of the Ukrainian language. Built... |
|
Experimental |
| 32 |
olastor/german-word-frequencies
Simple word to frequency mappings for the german language based on text... |
|
Experimental |
| 33 |
semnan-university-ai/nlp-stopwords
This is a comprehensive stopwords for natural language processing and text mining. |
|
Experimental |
| 34 |
AidaLog/Common-Swahili-stopwords
This curated collection brings together a dataset of common Swahili... |
|
Experimental |