Text Mining Fundamentals NLP Tools

Introductory courses, tutorials, and practical guides covering core text mining techniques, workflows, and applications. Includes repositories focused on teaching text processing, analysis methods, and statistical approaches to text data. Does NOT include domain-specific applications (sentiment analysis, fake news detection, etc.) or advanced specialized tools already categorized elsewhere.

There are 65 text mining fundamentals tools tracked. The highest-rated is dipanjanS/text-analytics-with-python at 44/100 with 1,690 stars.

Get all 65 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=text-mining-fundamentals&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 dipanjanS/text-analytics-with-python

Learn how to process, classify, cluster, summarize, understand syntax,...

44
Emerging
2 jonathandunn/text_analytics

Basic text analytics and natural language processing in Python

41
Emerging
3 Clarifai/clarifai-pyspark

Interfaces for Unstructured data and ML pipelines with Databricks and Clarifai

38
Emerging
4 IBM/watson-document-co-relation

Correlate text content across documents using Watson NLU, Python NLTK and...

37
Emerging
5 itrummer/NaturalMiner

Mine data for patterns described in natural language

35
Emerging
6 umer7/Applied-Text-Mining-in-Python

Repo for Applied Text Mining in Python (coursera) by University of Michigan

34
Emerging
7 EudaLabs/nlp

A repository for Natural Language Processing (NLP) projects, tools, and experiments.

33
Emerging
8 fingeredman/teanaps

자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.

32
Emerging
9 remrama/krank

Fetch curated dream reports.

32
Emerging
10 mchesterkadwell/intro-to-text-mining-with-python

Cambridge Digital Humanities 'Introduction to Text-Mining with Python'...

31
Emerging
11 algonell/ipo-miner

IPO Investment via Text Mining.

31
Emerging
12 zaratsian/Spark

Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References

30
Emerging
13 oroszgy/hungarian-text-mining-workshop

Materials for the Text Mining workshop held in the HuNLP meetup, June 2017

30
Emerging
14 blanchefort/text_mining

Набор ноутбуков, в которых решаются различные задачи обработки естественного...

30
Emerging
15 mchesterkadwell/intro-to-text-mining-with-python-2020

Cambridge Digital Humanities Learning, Methods Workshop: "Introduction to...

30
Emerging
16 malares/STeM-Scientifc-Paper-Mining-Tool

STeM is a text mining tool to help scientists and researchers evaluate new...

30
Emerging
17 QData/TextAttack-WebDemo

TextAttack Web Demo

28
Experimental
18 JohnSnowLabs/spark-nlp-conda

Build and publish Spark NLP to Anaconda Cloud

28
Experimental
19 hhaoyan/awesome-textmining-materials-science

Collection of papers on text mining for materials science

28
Experimental
20 fingeredman/text-mining-for-practice

파이썬 라이브러리를 활용해 텍스트 분석을 수행하는 방법에 대해 다룹니다.

28
Experimental
21 argilla-io/biome-text

Custom Natural Language Processing with big and small models 🌲🌱

28
Experimental
22 lorenzoscottb/DReAMy

DReAMy: a library for dream-reports annotation methods with python, NLP, and LLMs

28
Experimental
23 arshren/MachineLearning

Machine Learning documents

26
Experimental
24 mb010/Text2Tag

Code base for the analysis presented in in Bowles et al. 2022: "Radio Galaxy...

26
Experimental
25 DmitrySerg/open-data

Collecting and analysing open data stuff

24
Experimental
26 buomsoo-kim/Introduction-to-text-mining-with-Python

Lectures in Urban Data Science Lab, Seoul

24
Experimental
27 MrpYA45/github-text-mining-tfg

We're aiming to create a tool which lets us experiment with text mining and...

24
Experimental
28 HimanshuMittal01/bagmodels

Various bag-of-words ML algorithms like BM25

24
Experimental
29 thatguy1104/NLP-Data-Mining-Engine

Our main project goals include trying to achieve a way for all researchers...

24
Experimental
30 SAP-samples/github-pull-analyzer

The GitHub Pull Request Analyzer (with SAP AI Core) automates the task of...

23
Experimental
31 aeleraqi/Text-Mining

Text mining techniques and workflows in Python

23
Experimental
32 ycatsh/connor

Organize and classify files based on their content using NLP

23
Experimental
33 prestondunton/marvel-dialogue-nlp

A machine learning project that will use Natural Language Processing (NLP)...

22
Experimental
34 Vaibhavabhaysharma/Applied-Text-Mining-in-Python

This repository contains solutions of the course-...

22
Experimental
35 SciCrunch/Antibody-Watch

Antibody Watch: Text Mining Antibody Specificity from the Literature

21
Experimental
36 juliasilge/ibm-ai-day

Presentation for IBM Community Day AI

21
Experimental
37 MahsaShk/ApacheSpark

Apache Spark machine learning project using pyspark

20
Experimental
38 StabRise/ScaleDP-Tutorials

Tutorials for ScaleDP library. ScaleDP is an Open-Source Library for...

19
Experimental
39 park1997/Industrial_safety_and_health_law-visualization

산업안전보건법 법규시각화, 텍스트마이닝을 통한 법들간의 유사도 네트워크화

18
Experimental
40 cyidhn/texto

📚 La librairie Python de textométrie.

17
Experimental
41 analyticalmonk/pyspark_nlp_workshop

Instructions and code for the workshop "From Big Data to NLP Insights:...

17
Experimental
42 sudheera96/pyspark-textprocessing

Project on word count using pySpark, data bricks cloud environment.

16
Experimental
43 Achint08/tech-diffusion

Patents data analysis on PySpark

16
Experimental
44 AsadiAhmad/Edit-Distance-Spark

Calculating Edit Distance with PySpark

16
Experimental
45 AsadiAhmad/Ngram-Spark-Wikipedia

Calculating Ngram with PySpark for wikipedia text

16
Experimental
46 AsadiAhmad/Word-Counter-Spark

Word counter with spark

16
Experimental
47 fredriko/draviz

A method for assessing the data readiness of NLP projects, as well as the...

16
Experimental
48 fingeredman/text-mining-for-beginner

파이썬 기초문법 부터 간단한 텍스트 분석을 수행하는 방법에 대해 다룹니다.

16
Experimental
49 mucahidozcelik/NLP

Text Mining and Natural Language Processing

15
Experimental
50 fingeredman/advanced-text-mining

TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.

15
Experimental
51 paulbricman/memnav

Expanding propositional memory through text mining.

15
Experimental
52 MuzamilSaiq/toy-to-theory-bag-of-words

Pedagogical walkthrough of Bag of Words

15
Experimental
53 thukg/AMinerOpen

An open source community who focuses on developing and publishing elegant...

14
Experimental
54 frances-ai/frances-api

frances is an advanced cloud-based text mining digital platform that...

13
Experimental
55 peetceenatoo/my-first-keyword-extractor

first steps into natural language processing

13
Experimental
56 tkachuksergiy/aws-spark-nlp

Works related to recent project on the use of Apache Spark and AWS cloud for...

13
Experimental
57 yashmanne/an_analysis_of_nothing

Exploring character occurrences and NLP with Seinfeld scripts.

11
Experimental
58 ReAlex1902/Hawk

German documents analysis

11
Experimental
59 manmeetkaurbaxi/Analyzing-ACL-and-EMNLP-papers

Analyzing paper details of ACL and EMNLP from 2016-2021.

11
Experimental
60 ekardatos/TextAnalysisAndStatisticalTesting

Statistical hypothesis testing applied to linguistic text data.

11
Experimental
61 YukiChen-yuxin/proj_NLPbrl_DATA534

The NLPbrl wrapper API is a package for wrapping The Rosette Text Analytics...

11
Experimental
62 N-y-c-t-o/Gutenberg-scribe-main

A Python-based project that processes and analyzes public-domain books from...

11
Experimental
63 Doubtable-Steves-Linguistics/MinecraftNLP

Natural Language Processing (NLP) project built to predict GitHub repository...

10
Experimental
64 Robin1999Stark/Recipe_Tagger

NLP Project for Auto Labeling Receipes

10
Experimental
65 exaiatech/cymo-tutorial

CYMO is a next-generation text mining and analytics software developed by Exaia

10
Experimental