Topic Modeling Clustering NLP Tools
Tools for discovering latent topics in document collections and grouping documents by thematic similarity using methods like LDA, NMF, VAE-based models, and clustering algorithms. Does NOT include general text classification, sentiment analysis, or document retrieval systems.
There are 98 topic modeling clustering tools tracked. 4 score above 50 (established tier). The highest-rated is MIND-Lab/OCTIS at 67/100 with 799 stars and 1,106 monthly downloads.
Get all 98 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=topic-modeling-clustering&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
MIND-Lab/OCTIS
OCTIS: Comparing Topic Models is Simple! A python package to optimize and... |
|
Established |
| 2 |
i-dot-ai/themefinder
A topic modelling Python package for analysing one-to-many question-answer data. |
|
Established |
| 3 |
bobxwu/TopMost
A Topic Modeling System Toolkit (ACL 2024 Demo) |
|
Established |
| 4 |
baidu/Familia
A Toolkit for Industrial Topic Modeling |
|
Established |
| 5 |
andifunke/topic-labeling
The project proposes a framework to apply topic models on a text-corpus and... |
|
Emerging |
| 6 |
bab2min/tomotopy
Python package of Tomoto, the Topic Modeling Tool |
|
Emerging |
| 7 |
AnFreTh/STREAM
ACL Python package engineered for seamless topic modeling, topic evaluation,... |
|
Emerging |
| 8 |
rwalk/gsdmm
GSDMM: Short text clustering |
|
Emerging |
| 9 |
RandyPen/TextCluster
短文本聚类预处理模块 Short text cluster |
|
Emerging |
| 10 |
robolab-pavia/slr-kit
slr-kit is a framework to support the analysis of scientific literature... |
|
Emerging |
| 11 |
yash91sharma/MALTopic-py
A multi-agent LLM topic modeling library |
|
Emerging |
| 12 |
FesonX/cn-text-classifier
中文文本聚类 |
|
Emerging |
| 13 |
datquocnguyen/jLDADMM
A Java package for the LDA and DMM topic models |
|
Emerging |
| 14 |
riedlma/topictiling
TopicTiling is a text segmentation method that is based on LDA |
|
Emerging |
| 15 |
MaartenGr/Concept
Concept Modeling: Topic Modeling on Images and Text |
|
Emerging |
| 16 |
go-nlp/dmmclust
dmmclust is a package for clustering short texts, based on Yin and Wang (2014) |
|
Emerging |
| 17 |
MLSA-SRM/Lexicon-The-Auto-Tagger
A browser extension to automatically generate tags for your online blog,... |
|
Emerging |
| 18 |
daniel-furman/awesome-chatgpt-prompts-clustering
Text clustering: HDBSCAN is probably all you need. |
|
Emerging |
| 19 |
jonaschn/awesome-topic-models
✨ Awesome - A curated list of amazing Topic Models (implementations,... |
|
Emerging |
| 20 |
koheiw/seededlda
LDA for semisupervised topic modeling |
|
Experimental |
| 21 |
chen0040/java-lda
Package provides java implementation of the latent dirichlet allocation... |
|
Experimental |
| 22 |
smacawi/bert-topics
Bridging the gap between supervised classification and unsupervised topic... |
|
Experimental |
| 23 |
thakur-nandan/topic-modeling
This repository contains as intuitive example on topic-modeling using... |
|
Experimental |
| 24 |
mattmurray/topic_modelling_financial_news
Topic modelling on financial news with Natural Language Processing |
|
Experimental |
| 25 |
CZboop/Newspaper-Topic-Modelling
Topic modelling and analysis of different UK newspapers, primarily using BERTopic |
|
Experimental |
| 26 |
lettier/lda-topic-modeling
A PureScript, browser-based implementation of LDA topic modeling. |
|
Experimental |
| 27 |
bhattbhavesh91/Top2Vec-Demo
Demo on Top2Vec to generate topics using BERT model |
|
Experimental |
| 28 |
scientist-labs/topical
Ruby library for fast, flexible topic modeling — built on modern embeddings... |
|
Experimental |
| 29 |
bloomberg/fast-noise-aware-topic-clustering
Research code and scripts used in the Silburt et al. (2021) EMNLP 2021 paper... |
|
Experimental |
| 30 |
thiswillbeyourgithub/AnnA_Anki_neuronal_Appendix
Using machine learning on your anki collection to enhance the scheduling via... |
|
Experimental |
| 31 |
kjahan/twitter_mining
Twitter Mining in Java |
|
Experimental |
| 32 |
Saiken77/topic-modeling-pipeline-fr-app
Pipeline complet de modélisation de sujets (LDA, NMF, BERTopic) avec... |
|
Experimental |
| 33 |
berksudan/OTMISC-Topic-Modeling-Tool
We created a topic modeling pipeline to evaluate different topic modeling... |
|
Experimental |
| 34 |
AkramChaabnia/SEALClust
Text Clustering as Classification with LLMs - PPD M2 MLSD reproduction with... |
|
Experimental |
| 35 |
drob-xx/TopicTuner
HDBSCAN Tuning for BERTopic Models |
|
Experimental |
| 36 |
stdlib-js/nlp-lda
Latent Dirichlet Allocation via collapsed Gibbs sampling. |
|
Experimental |
| 37 |
kalemaria/cluster-constructor
Image and Text-based Clustering for Industrial Machine Parts |
|
Experimental |
| 38 |
kedir/GLG--Topic-Modeling-and-Document-Clustering
Cluster documents and extract global and local topics per cluster using LDA... |
|
Experimental |
| 39 |
diem-ai/topic-modeling
Retrieving real time breaking news from... |
|
Experimental |
| 40 |
yash-rai-93/arXiv-Topic-Discovery
Unsupervised topic discovery on 100K+ arXiv abstracts using LDA, NMF,... |
|
Experimental |
| 41 |
kotartemiy/topic-labeled-news-dataset
100k+ topic labeled news articles published from thousands of news websites |
|
Experimental |
| 42 |
ocstringham/text_classification_wildlife_trade
Code and data for text classification models association with "Text... |
|
Experimental |
| 43 |
K-RLange/Lex2Sent
Lex2Sent package for unsupervised text classification/clustering |
|
Experimental |
| 44 |
chen0040/java-plsa
Package provides the java implementation of probabilistic latent semantic... |
|
Experimental |
| 45 |
DivyaRustagi10/indicCTM-Foundations
Contextualized Topic Modeling using Zero-Shot Learning on Indic Languages (IndicCTM) |
|
Experimental |
| 46 |
iwan-rg/Arabic-Topic-Modeling
BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique |
|
Experimental |
| 47 |
byukan/Chatbots-NLP
Chatbots and other NLP applications: Topic Modeling on text from Codechef and OkCupid |
|
Experimental |
| 48 |
JuaniLlaberia/news_articles_grouping_research
Multi-signal graph-based NLP pipeline for clustering news articles into... |
|
Experimental |
| 49 |
papachristoumarios/sade
Code for paper: Software clusterings with vector semantics and the call graph |
|
Experimental |
| 50 |
cognitivefactory/interactive-clustering
Python package used to apply NLP interactive clustering methods. |
|
Experimental |
| 51 |
goerlitz/nlp-topic-models
Application of topic models for topic extraction and similarity search |
|
Experimental |
| 52 |
vrjkmr/arxiv-topic
Detecting topic clusters in arXiv ML papers. |
|
Experimental |
| 53 |
silviaruffini/text_clustering
Text Clustering with Python and Dash |
|
Experimental |
| 54 |
trinker/clustext
Easy, fast clustering of texts |
|
Experimental |
| 55 |
harishaaram/Topic-Modeling
An Unbiased Examination of Federal Reserve Meeting minutes |
|
Experimental |
| 56 |
sean-chester/generalised-brown
C++ implementation of Generalised Brown clustering and python scripts for... |
|
Experimental |
| 57 |
anil1055/n-stage_LDA
Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA |
|
Experimental |
| 58 |
LaurentVeyssier/Topic-Modeling-and-Document-Categorization-using-Latent-Dirichlet-Allocation
Categorize documents per topics inferred by LDA algorithm |
|
Experimental |
| 59 |
GMU-Capstone-690/Data-Tagging-via-Content-and-Standards
An approach to organize text data generated from URLs by tagging it to... |
|
Experimental |
| 60 |
caimeng2/TopicModelingWorkshop
SSDA workshop: topic modeling for exploratory text analysis |
|
Experimental |
| 61 |
Keshav1506/Topic-Modelling-on-BBC-News-Articles-using-LDA
This is the fourth capstone project I've done in my Almabetter Data science... |
|
Experimental |
| 62 |
kjahan/lda
Extracting Hidden Topics from Texts using LDA Model |
|
Experimental |
| 63 |
Lincoln-France/twitchatds
Language modeling of twitch chat streamers |
|
Experimental |
| 64 |
contefranz/OpTop
Optimal topic identification from a pool of Latent Dirichlet Allocation models |
|
Experimental |
| 65 |
saky-semicolon/Topic-Modeling-on-Mental-Health-Related-Tweets
A project using LDA and BERTopic for topic modeling, sentiment analysis, and... |
|
Experimental |
| 66 |
chlin907/TopicModeling
Topic Modeling for IMDB top movies by NLP on their synopses |
|
Experimental |
| 67 |
ebrahimpichka/semantic-textual-similarity
Categorizing products of an online retailer based on products’ titles using... |
|
Experimental |
| 68 |
Htiango/Chinese-LDA
This is Tianyu Hong's first version of a program using LDA to predict... |
|
Experimental |
| 69 |
michimalek/nlp-clustering-research
A python Sentence-Clustering library based on S-Bert and a diverse number of... |
|
Experimental |
| 70 |
davidmasse/blog-fashion-system
NLP analysis of fashion trends |
|
Experimental |
| 71 |
Navy10021/Parallel_Clustering_based_TM
Parallel clustering-based Topic Modeling |
|
Experimental |
| 72 |
vahadruya/Capstone-Project-Unsupervised-ML-Topic-Modelling
The project explores a dataset of 2225 BBC News Articles and identifies the... |
|
Experimental |
| 73 |
fatimagulomova/twitter-topic-extraction
This project explores prevalent topics in Twitter discussions related to the... |
|
Experimental |
| 74 |
SmartData-Polito/honeycluster
Can NLP tools support security experts for analysis on SSH exploits? An... |
|
Experimental |
| 75 |
IlyaGusev/purano
News annotation and clustering |
|
Experimental |
| 76 |
amitvikramraj/Topic-Modelling-Using-RACE-Dataset
A Project on Topic Modeling using alogoriths like LSA/LSI, LDA, NMF on RACE dataset |
|
Experimental |
| 77 |
sameer-at-git/BBC-News-Topic-Modelling
BBC news dataset pipeline : data collection, cleaning, and topic modelling... |
|
Experimental |
| 78 |
CameleoGrey/ProfitTM
A topic modeling framework based on word embeddings and neural nets that... |
|
Experimental |
| 79 |
ShihabYasin/LDA-to-Context-Based-Search
Context Based Search Using LDA Topic Modelling Algorithm |
|
Experimental |
| 80 |
carlomarxdk/topic_modelling
Topic Modelling with the HPA (Tomotopy) model |
|
Experimental |
| 81 |
avrtt/topic-modeling-helper
NLP pipeline for text classification and topic modeling using LDA, spaCy,... |
|
Experimental |
| 82 |
Develop-Packt/Topic-Modeling
You will evaluate latent Dirichlet allocation models and execute... |
|
Experimental |
| 83 |
1997alireza/QA-Clustering
Implementation of some algorithms for text clustering |
|
Experimental |
| 84 |
nipunchauhan/Topic-Modeling-NLP-Python-Knime
This project compares topic modeling and text clustering techniques on BBC... |
|
Experimental |
| 85 |
RahulNeuroByte/Topic-Modeler
A complete end-to-end system for document topic modeling and clustering... |
|
Experimental |
| 86 |
novitangrn/ToMoLDA
Projek ‘Klasifikasi Teks dalam NLP untuk Mendeteksi Topik Berita Berbasis... |
|
Experimental |
| 87 |
carlomarxdk/TemporalTopicModelling-Pachinko
Temporal Topic Modelling (Topic Evolution Analysis) using the Pachinko... |
|
Experimental |
| 88 |
amans-meta/catalog-auto-tagger
AI-powered catalog tagging system for e-commerce. Automatically generates... |
|
Experimental |
| 89 |
AdityaSharma2007/anthem-semantic-analyzer
NLP-based semantic clustering and similarity search system for national... |
|
Experimental |
| 90 |
Nathrw/NLTK-Project---Text-Message-Topic-Analysis
NLTK Project - Text Message Topic Analysis project |
|
Experimental |
| 91 |
skngetich/mwananchi-watch
Enhancing Civil Engagement using Topic modelling |
|
Experimental |
| 92 |
andrewabeles/drug-labels
Drug label text classification and topic modeling web app |
|
Experimental |
| 93 |
tomtx/cp-thematic-maps
human-annotated dataset of thematic maps for NLP downstream tasks with... |
|
Experimental |
| 94 |
bindusri0702/Dravidian_Top2Vec
Top2Vec language modelling on Tamil and Telugu news data |
|
Experimental |
| 95 |
konkinit/topic_modeling
A BERTopic-based modeling project |
|
Experimental |
| 96 |
oedatainsight/CTG-Latent-Dirichilet-Allocation
LDA excericise for ClinicalTrials interventional data |
|
Experimental |
| 97 |
talhamasood0000/Topic_modelling_in_Urdu_LDA
Implementation of LDA Topic Modelling technique on Urdu Language |
|
Experimental |
| 98 |
Matheus-Schmitz/mapta
MAP is a USC DSCI 560 group doing research on identifying marginalized... |
|
Experimental |