Stopword Lists Datasets NLP Tools

Collections of stopword lists and datasets for removing common words across languages. Includes pre-compiled stopword collections, language-specific stopword resources, and tools for generating stopword lists. Does NOT include general text preprocessing frameworks, stemming/lemmatization tools, or broader NLP preprocessing pipelines.

There are 34 stopword lists datasets tools tracked. 1 score above 50 (established tier). The highest-rated is Alir3z4/python-stop-words at 68/100 with 159 stars and 237,397 monthly downloads.

Get all 34 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=stopword-lists-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 Alir3z4/python-stop-words

Get list of common stop words in various languages in Python

68
Established
2 hklemp/dotnet-stop-words

Get list of common stop words in various languages in dotnet

39
Emerging
3 eklem/stopword-trainer

A module for creating stopword lists for any language, based on a set of documents.

39
Emerging
4 igorbrigadir/stopwords

Default English stopword lists from many different sources

36
Emerging
5 skupriienko/Ukrainian-Stopwords

the list of ~2000 ukrainian stopwords (with numbers)

36
Emerging
6 stdlib-js/datasets-savoy-stopwords-fr

A list of French stop words.

35
Emerging
7 stdlib-js/datasets-cmudict

The Carnegie Mellon Pronouncing Dictionary (CMUdict).

33
Emerging
8 skupriienko/Ukrainian-Sentiment-Analysis

The list of Ukrainian words for sentiment analysis and NLP

30
Emerging
9 egorsmkv/ukrainian-accentor

Add accents to words in the Ukrainian language

29
Experimental
10 Sashank222222/massive-english-word-list

πŸ“š Explore a comprehensive English word list with over 68,000 entries,...

27
Experimental
11 pharo-ai/stopwords

Load the stopwords that you need in Pharo

26
Experimental
12 stdlib-js/datasets-savoy-stopwords-por

A list of Portuguese stop words.

25
Experimental
13 stdlib-js/datasets-liu-positive-opinion-words-en

A list of positive opinion words.

25
Experimental
14 stdlib-js/datasets-stopwords-en

A list of English stop words.

25
Experimental
15 stdlib-js/datasets-savoy-stopwords-sp

A list of Spanish stop words.

22
Experimental
16 stdlib-js/datasets-savoy-stopwords-it

A list of Italian stop words.

22
Experimental
17 contactsunny/RemoveStopWordsInJavaPOC

This is a simple Spring Boot project which removes stop words from a text file.

22
Experimental
18 aeleraqi/arabic-stopwords

This repository contains a comprehensive list of Arabic stopwords.

22
Experimental
19 latincy/verba

verba.txt - A Latin word list in the style of Unix /usr/share/dict/words

22
Experimental
20 Rayraegah/adjectives

A data dump of all adjectives in English language

22
Experimental
21 Helsinki-NLP/UkrainianLT

A collection of links to Ukrainian language tools

22
Experimental
22 stdlib-js/datasets-liu-negative-opinion-words-en

A list of negative opinion words.

22
Experimental
23 stdlib-js/datasets-savoy-stopwords-ger

A list German stop words.

22
Experimental
24 kavgan/stop-words

Stop word lists

21
Experimental
25 vikasing/news-stopwords

A huge list of stopwords collected from millions of news articles

21
Experimental
26 Vidito/norstop

Norstop is a lightweight, zero-dependency Python library to remove Norwegian...

21
Experimental
27 lang-uk/ukrainian-word-stress-dictionary

Dictionary of word stresses in the Ukrainian language πŸ‡ΊπŸ‡¦

18
Experimental
28 bimarakajati/Javanese-and-Sundanese-Stopwords

This project aims to provide stopwords for the Javanese and Sundanese...

16
Experimental
29 Theodotus1243/ukrainian-accentor-transformer

Add accents to words in the Ukrainian language

13
Experimental
30 ynsrc/german-categorized-wordlist

German Categorized Wordlist Project

12
Experimental
31 raccoon-hero/uk-dictionary

A paradigm-based morphological dictionary of the Ukrainian language. Built...

12
Experimental
32 olastor/german-word-frequencies

Simple word to frequency mappings for the german language based on text...

12
Experimental
33 semnan-university-ai/nlp-stopwords

This is a comprehensive stopwords for natural language processing and text mining.

10
Experimental
34 AidaLog/Common-Swahili-stopwords

This curated collection brings together a dataset of common Swahili...

10
Experimental