Text Clustering Topic Modeling Transformer Models

Tools for unsupervised discovery and organization of text documents through clustering, dimensionality reduction, and topic extraction using transformer embeddings. Does NOT include supervised text classification, document retrieval/search, or general semantic similarity tasks.

There are 26 text clustering topic modeling models tracked. 2 score above 70 (verified tier). The highest-rated is MaartenGr/BERTopic at 71/100 with 7,443 stars. 1 of the top 10 are actively maintained.

Get all 26 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=text-clustering-topic-modeling&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 MaartenGr/BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

71
Verified
2 webis-de/small-text

Active Learning for Text Classification in Python

70
Verified
3 mead-ml/mead-baseline

Deep-Learning Model Exploration and Development for NLP

42
Emerging
4 x-tabdeveloping/turftopic

Robust and fast topic models with sentence-transformers.

40
Emerging
5 HumanSignal/label-studio-transformers

Label data using HuggingFace's transformers and automatically get a...

39
Emerging
6 hiyouga/Dual-Contrastive-Learning

Code for our paper "Dual Contrastive Learning: Text Classification via...

38
Emerging
7 hsisaberi/single-trait-electra

A complete ELECTRA-based framework for Big Five personality trait...

30
Emerging
8 kmaurinjones/AllMeans

Automatic topic modelling using minimal external input and computational resources

30
Emerging
9 DarshanAdiga/idiom-principle-on-magpie-corpus

Idiom Principle on MAGPIE dataset

23
Experimental
10 Boykadakim/User-Clustering-with-BERT-Models

User Clustering Pipelines with BERT Models on Long and Heterogeneous Tweets...

23
Experimental
11 anisderoual/Document_Archiver_Korean-NLP_BERTClustering

📂 Extract, embed, cluster, and securely store Korean text from documents...

22
Experimental
12 nerdimite/bert-finetuning-webinar

Code for the FullStack AI Live Coding Series- Part 1 (CellStrat AI Lab)

21
Experimental
13 WeskerPRO/NLP_Project

Fine-tuning BERT and BART for sentiment analysis, paraphrase detection, and...

20
Experimental
14 ai-center-kth/cuBERT-source-code-clustering

Fine-tuning cuBERT embeddings for clustering source code by functionality

18
Experimental
15 ia-labo/French-News-Clustering

Text classification and clustering using transformers and Denstream.

17
Experimental
16 LennartKeller/DeepTextClustering

Deep text clustering with language models

17
Experimental
17 nolnolon/User-Clustering-with-BERT-Models

User Clustering Pipelines with BERT Models on Long and Heterogeneous Tweets...

15
Experimental
18 shaikadish/twitterTopicModeling

A tutorial I wrote to show the practical application of topic modelling for...

14
Experimental
19 simonescevaroli/yelp-rating-prediction

This is the repository for the Natural Language Processing project done at...

14
Experimental
20 AdirthaBorgohain/BERT-Text-Analysis

Text Analysis done on a business text dataset using KeyBERT and BERTopic

14
Experimental
21 tre-systems/cefr-workshop

Educational workshop for NLP engineers. Fine-tuning DeBERTa-v3 for CEFR...

13
Experimental
22 fork123aniket/Zero-Shot-Question-Answering

Implementation of Zero-Shot Question Answering in PyTorch

11
Experimental
23 Ecolash/Natural-Language-Processing

𝗡𝗮𝘁𝘂𝗿𝗮𝗹 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 | 𝗖𝗦𝟲𝟬𝟬𝟳𝟱 | 𝗣𝗢𝗦 𝗧𝗮𝗴𝗴𝗶𝗻𝗴, 𝗠𝗶𝗻𝗶 𝗖𝗼𝗣𝗶𝗹𝗼𝘁, 𝗕𝗘𝗥𝗧

11
Experimental
24 cluebbers/NLP_DeepLearning_Spring2023

Implementing and fine-tuning BERT for sentiment analysis, paraphrase...

11
Experimental
25 battles5/amelia-bertino-legal-nlp

Legal Argument Mining on Italian tax-court decisions (AMELIA dataset) ...

11
Experimental
26 mgiorgi13/MITopics

Topic detection to identify the main topics on MIT management papers

11
Experimental

Comparisons in this category