Topic Modeling Clustering NLP Tools

Tools for discovering latent topics in document collections and grouping documents by thematic similarity using methods like LDA, NMF, VAE-based models, and clustering algorithms. Does NOT include general text classification, sentiment analysis, or document retrieval systems.

There are 98 topic modeling clustering tools tracked. 4 score above 50 (established tier). The highest-rated is MIND-Lab/OCTIS at 67/100 with 799 stars and 1,106 monthly downloads.

Get all 98 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=topic-modeling-clustering&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 MIND-Lab/OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and...

67
Established
2 i-dot-ai/themefinder

A topic modelling Python package for analysing one-to-many question-answer data.

66
Established
3 bobxwu/TopMost

A Topic Modeling System Toolkit (ACL 2024 Demo)

57
Established
4 baidu/Familia

A Toolkit for Industrial Topic Modeling

51
Established
5 andifunke/topic-labeling

The project proposes a framework to apply topic models on a text-corpus and...

46
Emerging
6 bab2min/tomotopy

Python package of Tomoto, the Topic Modeling Tool

46
Emerging
7 AnFreTh/STREAM

ACL Python package engineered for seamless topic modeling, topic evaluation,...

44
Emerging
8 rwalk/gsdmm

GSDMM: Short text clustering

43
Emerging
9 RandyPen/TextCluster

短文本聚类预处理模块 Short text cluster

42
Emerging
10 robolab-pavia/slr-kit

slr-kit is a framework to support the analysis of scientific literature...

41
Emerging
11 yash91sharma/MALTopic-py

A multi-agent LLM topic modeling library

39
Emerging
12 FesonX/cn-text-classifier

中文文本聚类

39
Emerging
13 datquocnguyen/jLDADMM

A Java package for the LDA and DMM topic models

37
Emerging
14 riedlma/topictiling

TopicTiling is a text segmentation method that is based on LDA

34
Emerging
15 MaartenGr/Concept

Concept Modeling: Topic Modeling on Images and Text

32
Emerging
16 go-nlp/dmmclust

dmmclust is a package for clustering short texts, based on Yin and Wang (2014)

32
Emerging
17 MLSA-SRM/Lexicon-The-Auto-Tagger

A browser extension to automatically generate tags for your online blog,...

31
Emerging
18 daniel-furman/awesome-chatgpt-prompts-clustering

Text clustering: HDBSCAN is probably all you need.

30
Emerging
19 jonaschn/awesome-topic-models

✨ Awesome - A curated list of amazing Topic Models (implementations,...

30
Emerging
20 koheiw/seededlda

LDA for semisupervised topic modeling

29
Experimental
21 chen0040/java-lda

Package provides java implementation of the latent dirichlet allocation...

29
Experimental
22 smacawi/bert-topics

Bridging the gap between supervised classification and unsupervised topic...

28
Experimental
23 thakur-nandan/topic-modeling

This repository contains as intuitive example on topic-modeling using...

28
Experimental
24 mattmurray/topic_modelling_financial_news

Topic modelling on financial news with Natural Language Processing

28
Experimental
25 CZboop/Newspaper-Topic-Modelling

Topic modelling and analysis of different UK newspapers, primarily using BERTopic

27
Experimental
26 lettier/lda-topic-modeling

A PureScript, browser-based implementation of LDA topic modeling.

27
Experimental
27 bhattbhavesh91/Top2Vec-Demo

Demo on Top2Vec to generate topics using BERT model

27
Experimental
28 scientist-labs/topical

Ruby library for fast, flexible topic modeling — built on modern embeddings...

27
Experimental
29 bloomberg/fast-noise-aware-topic-clustering

Research code and scripts used in the Silburt et al. (2021) EMNLP 2021 paper...

26
Experimental
30 thiswillbeyourgithub/AnnA_Anki_neuronal_Appendix

Using machine learning on your anki collection to enhance the scheduling via...

26
Experimental
31 kjahan/twitter_mining

Twitter Mining in Java

26
Experimental
32 Saiken77/topic-modeling-pipeline-fr-app

Pipeline complet de modélisation de sujets (LDA, NMF, BERTopic) avec...

25
Experimental
33 berksudan/OTMISC-Topic-Modeling-Tool

We created a topic modeling pipeline to evaluate different topic modeling...

25
Experimental
34 AkramChaabnia/SEALClust

Text Clustering as Classification with LLMs - PPD M2 MLSD reproduction with...

25
Experimental
35 drob-xx/TopicTuner

HDBSCAN Tuning for BERTopic Models

24
Experimental
36 stdlib-js/nlp-lda

Latent Dirichlet Allocation via collapsed Gibbs sampling.

24
Experimental
37 kalemaria/cluster-constructor

Image and Text-based Clustering for Industrial Machine Parts

24
Experimental
38 kedir/GLG--Topic-Modeling-and-Document-Clustering

Cluster documents and extract global and local topics per cluster using LDA...

24
Experimental
39 diem-ai/topic-modeling

Retrieving real time breaking news from...

24
Experimental
40 yash-rai-93/arXiv-Topic-Discovery

Unsupervised topic discovery on 100K+ arXiv abstracts using LDA, NMF,...

24
Experimental
41 kotartemiy/topic-labeled-news-dataset

100k+ topic labeled news articles published from thousands of news websites

23
Experimental
42 ocstringham/text_classification_wildlife_trade

Code and data for text classification models association with "Text...

23
Experimental
43 K-RLange/Lex2Sent

Lex2Sent package for unsupervised text classification/clustering

23
Experimental
44 chen0040/java-plsa

Package provides the java implementation of probabilistic latent semantic...

22
Experimental
45 DivyaRustagi10/indicCTM-Foundations

Contextualized Topic Modeling using Zero-Shot Learning on Indic Languages (IndicCTM)

22
Experimental
46 iwan-rg/Arabic-Topic-Modeling

BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique

22
Experimental
47 byukan/Chatbots-NLP

Chatbots and other NLP applications: Topic Modeling on text from Codechef and OkCupid

22
Experimental
48 JuaniLlaberia/news_articles_grouping_research

Multi-signal graph-based NLP pipeline for clustering news articles into...

22
Experimental
49 papachristoumarios/sade

Code for paper: Software clusterings with vector semantics and the call graph

21
Experimental
50 cognitivefactory/interactive-clustering

Python package used to apply NLP interactive clustering methods.

21
Experimental
51 goerlitz/nlp-topic-models

Application of topic models for topic extraction and similarity search

20
Experimental
52 vrjkmr/arxiv-topic

Detecting topic clusters in arXiv ML papers.

20
Experimental
53 silviaruffini/text_clustering

Text Clustering with Python and Dash

19
Experimental
54 trinker/clustext

Easy, fast clustering of texts

19
Experimental
55 harishaaram/Topic-Modeling

An Unbiased Examination of Federal Reserve Meeting minutes

19
Experimental
56 sean-chester/generalised-brown

C++ implementation of Generalised Brown clustering and python scripts for...

19
Experimental
57 anil1055/n-stage_LDA

Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

18
Experimental
58 LaurentVeyssier/Topic-Modeling-and-Document-Categorization-using-Latent-Dirichlet-Allocation

Categorize documents per topics inferred by LDA algorithm

18
Experimental
59 GMU-Capstone-690/Data-Tagging-via-Content-and-Standards

An approach to organize text data generated from URLs by tagging it to...

17
Experimental
60 caimeng2/TopicModelingWorkshop

SSDA workshop: topic modeling for exploratory text analysis

17
Experimental
61 Keshav1506/Topic-Modelling-on-BBC-News-Articles-using-LDA

This is the fourth capstone project I've done in my Almabetter Data science...

17
Experimental
62 kjahan/lda

Extracting Hidden Topics from Texts using LDA Model

17
Experimental
63 Lincoln-France/twitchatds

Language modeling of twitch chat streamers

16
Experimental
64 contefranz/OpTop

Optimal topic identification from a pool of Latent Dirichlet Allocation models

16
Experimental
65 saky-semicolon/Topic-Modeling-on-Mental-Health-Related-Tweets

A project using LDA and BERTopic for topic modeling, sentiment analysis, and...

16
Experimental
66 chlin907/TopicModeling

Topic Modeling for IMDB top movies by NLP on their synopses

16
Experimental
67 ebrahimpichka/semantic-textual-similarity

Categorizing products of an online retailer based on products’ titles using...

16
Experimental
68 Htiango/Chinese-LDA

This is Tianyu Hong's first version of a program using LDA to predict...

16
Experimental
69 michimalek/nlp-clustering-research

A python Sentence-Clustering library based on S-Bert and a diverse number of...

15
Experimental
70 davidmasse/blog-fashion-system

NLP analysis of fashion trends

15
Experimental
71 Navy10021/Parallel_Clustering_based_TM

Parallel clustering-based Topic Modeling

14
Experimental
72 vahadruya/Capstone-Project-Unsupervised-ML-Topic-Modelling

The project explores a dataset of 2225 BBC News Articles and identifies the...

13
Experimental
73 fatimagulomova/twitter-topic-extraction

This project explores prevalent topics in Twitter discussions related to the...

13
Experimental
74 SmartData-Polito/honeycluster

Can NLP tools support security experts for analysis on SSH exploits? An...

13
Experimental
75 IlyaGusev/purano

News annotation and clustering

13
Experimental
76 amitvikramraj/Topic-Modelling-Using-RACE-Dataset

A Project on Topic Modeling using alogoriths like LSA/LSI, LDA, NMF on RACE dataset

12
Experimental
77 sameer-at-git/BBC-News-Topic-Modelling

BBC news dataset pipeline : data collection, cleaning, and topic modelling...

12
Experimental
78 CameleoGrey/ProfitTM

A topic modeling framework based on word embeddings and neural nets that...

12
Experimental
79 ShihabYasin/LDA-to-Context-Based-Search

Context Based Search Using LDA Topic Modelling Algorithm

12
Experimental
80 carlomarxdk/topic_modelling

Topic Modelling with the HPA (Tomotopy) model

12
Experimental
81 avrtt/topic-modeling-helper

NLP pipeline for text classification and topic modeling using LDA, spaCy,...

12
Experimental
82 Develop-Packt/Topic-Modeling

You will evaluate latent Dirichlet allocation models and execute...

12
Experimental
83 1997alireza/QA-Clustering

Implementation of some algorithms for text clustering

12
Experimental
84 nipunchauhan/Topic-Modeling-NLP-Python-Knime

This project compares topic modeling and text clustering techniques on BBC...

12
Experimental
85 RahulNeuroByte/Topic-Modeler

A complete end-to-end system for document topic modeling and clustering...

12
Experimental
86 novitangrn/ToMoLDA

Projek ‘Klasifikasi Teks dalam NLP untuk Mendeteksi Topik Berita Berbasis...

12
Experimental
87 carlomarxdk/TemporalTopicModelling-Pachinko

Temporal Topic Modelling (Topic Evolution Analysis) using the Pachinko...

12
Experimental
88 amans-meta/catalog-auto-tagger

AI-powered catalog tagging system for e-commerce. Automatically generates...

11
Experimental
89 AdityaSharma2007/anthem-semantic-analyzer

NLP-based semantic clustering and similarity search system for national...

11
Experimental
90 Nathrw/NLTK-Project---Text-Message-Topic-Analysis

NLTK Project - Text Message Topic Analysis project

11
Experimental
91 skngetich/mwananchi-watch

Enhancing Civil Engagement using Topic modelling

11
Experimental
92 andrewabeles/drug-labels

Drug label text classification and topic modeling web app

11
Experimental
93 tomtx/cp-thematic-maps

human-annotated dataset of thematic maps for NLP downstream tasks with...

11
Experimental
94 bindusri0702/Dravidian_Top2Vec

Top2Vec language modelling on Tamil and Telugu news data

11
Experimental
95 konkinit/topic_modeling

A BERTopic-based modeling project

11
Experimental
96 oedatainsight/CTG-Latent-Dirichilet-Allocation

LDA excericise for ClinicalTrials interventional data

10
Experimental
97 talhamasood0000/Topic_modelling_in_Urdu_LDA

Implementation of LDA Topic Modelling technique on Urdu Language

10
Experimental
98 Matheus-Schmitz/mapta

MAP is a USC DSCI 560 group doing research on identifying marginalized...

10
Experimental

Comparisons in this category