Text Preprocessing Pipelines NLP Tools
End-to-end tools and libraries for cleaning, normalizing, and preparing raw text data for NLP tasks. Includes tokenization, stemming, stopword removal, and data cleaning utilities. Does NOT include downstream NLP applications (sentiment analysis, classification, etc.), feature extraction, or domain-specific cleaning (tweets, names, etc.).
There are 45 text preprocessing pipelines tools tracked. 1 score above 70 (verified tier). The highest-rated is chartbeat-labs/textacy at 70/100 with 2,236 stars and 75,599 monthly downloads.
Get all 45 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=text-preprocessing-pipelines&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
chartbeat-labs/textacy
NLP, before and after spaCy |
|
Verified |
| 2 |
nltk/nltk_data
NLTK Data |
|
Established |
| 3 |
prasanthg3/cleantext
An open-source package for python to clean raw text data |
|
Established |
| 4 |
brightertiger/pygarble
Python Package to detect garbled, gibberish text for EN |
|
Established |
| 5 |
jfilter/clean-text
🧹 Python package for text cleaning |
|
Established |
| 6 |
citiususc/pyplexity
Cleaning tool for web scraped text |
|
Established |
| 7 |
LoLei/redditcleaner
Cleans Reddit Text Data :scroll: :broom: |
|
Emerging |
| 8 |
ksnugroho/basic-text-preprocessing
Basic text preprocessing for Bahasa with Python. |
|
Emerging |
| 9 |
textpipe/textpipe
Textpipe: clean and extract metadata from text |
|
Emerging |
| 10 |
takuti/prelims
Front matter post-processor for static site generators |
|
Emerging |
| 11 |
alinapetukhova/textcl
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/ |
|
Emerging |
| 12 |
MusfiqDehan/data-preprocessors
🛠️An easy to use tool for Data Preprocessing specially for Text Preprocessing |
|
Emerging |
| 13 |
Shubha23/Text-processing-NLP
This notebook contains entire text preprocessing pipeline for NLP problems.... |
|
Emerging |
| 14 |
huu4ontocord/rio
Text pre-processing for NLP datasets |
|
Emerging |
| 15 |
iaramer/dobbi
An open-source NLP library: fast text cleaning and preprocessing |
|
Experimental |
| 16 |
YugantM/textcleaner
text-data pre-processing utility |
|
Experimental |
| 17 |
aflah02/cleansetext
This is a simple library to help you clean your textual data |
|
Experimental |
| 18 |
mantzaris/KeemenaPreprocessing.jl
Preprocessing for text data: cleaning, normalization, vectorization,... |
|
Experimental |
| 19 |
Arfius/light-text-prepro
Python module that collects regex rules |
|
Experimental |
| 20 |
Abhayparashar31/crazytext
A Simple Easy To Use Text Cleaning Package For NLP Built In Python. It Can... |
|
Experimental |
| 21 |
umapornp/textprepro
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing. |
|
Experimental |
| 22 |
ninadpatil09/NLP-Notebooks
Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn:... |
|
Experimental |
| 23 |
Ankur3107/nlp_preprocessing
Text Preprocessing Package includes cleaning, tokenization, dataset... |
|
Experimental |
| 24 |
abeaderstadt/nlp-02-text-preprocessing
Text Preprocessing NLP Project |
|
Experimental |
| 25 |
Al-Hasib/eng_text_cleaner
A python package for cleaning text |
|
Experimental |
| 26 |
lgomezt/tidyX
Python package to clean raw tweets for ML applications. |
|
Experimental |
| 27 |
udityamerit/Text-Processing-Package-For-Natural-Language-Processing
This project is a comprehensive collection of NLP techniques, practical... |
|
Experimental |
| 28 |
angelsomo/nlp-text-cleaning
Lightweight Python CLI tool for robust text cleaning, Unicode normalization,... |
|
Experimental |
| 29 |
krisograbek/text-preprocessing
Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim,... |
|
Experimental |
| 30 |
MariyamSiddiqui/Text-Preprocessing-NLP-pipeline
End-to-end NLP text preprocessing pipeline using Python — includes... |
|
Experimental |
| 31 |
mahirmsb25/Text-Preprocessing-Pipeline
A Python-based NLP preprocessing pipeline using NLTK and Pandas to clean and... |
|
Experimental |
| 32 |
nluninja/nlp_crash_course_with_spacy
A Natural Language Processing crash course with SpaCy 2.6 and NLTK 3.6.2,... |
|
Experimental |
| 33 |
basit-afridi62/nlp-nltk-python
This repository is a hands-on guide to Natural Language Processing (NLP)... |
|
Experimental |
| 34 |
mookiezi/dataset-cleaning-toolkit
A dataset toolbox for preparing and analyzing conversational datasets,... |
|
Experimental |
| 35 |
Abdelrahman-Atef-Elsayed/NLP_Preprocessing_pipeline
This repo includes a generalized preprocessing pipeline for text data in NLP tasks. |
|
Experimental |
| 36 |
iam-salma/NLP-Bootcamp-with-python
A hands-on NLP Bootcamp using Python covering text preprocessing,... |
|
Experimental |
| 37 |
NITHISHM2410/text-preprocessing-techniques
This Repo includes modules that helps NLP related tasks. |
|
Experimental |
| 38 |
alanindra/baca-juga-cleaner
Program to clean news text by filtering out irrelevant syntactic... |
|
Experimental |
| 39 |
dodevca/tweet-preprocessor
Lightweight, modular, and extensible Python library for preprocessing... |
|
Experimental |
| 40 |
tnathu-ai/NLP-Job-Ad
Pre-process natural language text data to generate effective feature... |
|
Experimental |
| 41 |
Varsh008/text_preprocessor_toolkit
Configurable Text Preprocessing Toolkit in Python using spaCy |
|
Experimental |
| 42 |
michellepellon/tidyname
Intelligent company name cleaning and normalization for Python. Entity... |
|
Experimental |
| 43 |
shrutimary15/Text-data-preparation
The repository consists of a python code that inputs a text file consisting... |
|
Experimental |
| 44 |
nadinejackson1/text-preprocessing-pipeline
Basic text preprocessing pipeline, which includes tokenization, stemming,... |
|
Experimental |
| 45 |
tripathiadityap/cleantxty
Python package to clean strings and making them reasonable for NLP. |
|
Experimental |