Embedding Clustering Tools

Tools for clustering and organizing data (text, URLs, tables, time series) using embeddings and unsupervised/semi-supervised algorithms. Includes dimensionality reduction and clustering visualization. Does NOT include general semantic search, similarity matching, or domain-specific applications (recommendation systems, RAG, etc.).

There are 36 embedding clustering tools tracked. 1 score above 50 (established tier). The highest-rated is TorchDR/TorchDR at 59/100 with 199 stars and 2,234 monthly downloads.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-clustering-tools&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 TorchDR/TorchDR

TorchDR - PyTorch Dimensionality Reduction

59
Established
2 derrickburns/generalized-kmeans-clustering

Production-ready K-Means clustering for Apache Spark with pluggable Bregman...

49
Emerging
3 abhilash1910/ClusterTransformer

Topic clustering library built on Transformer embeddings and cosine...

47
Emerging
4 md-experiments/picture_text

Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)

45
Emerging
5 mainlp/semantic_components

Finding semantic components in your neural representations.

38
Emerging
6 scientist-labs/clusterkit

High-performance UMAP dimensionality reduction for Ruby, powered by the...

36
Emerging
7 nlpub/watset-java

An implementation of the Watset clustering algorithm in Java.

32
Emerging
8 abojchevski/rsc

Robust Spectral Clustering. Implementation of "Robust Spectral Clustering...

31
Emerging
9 kjpou1/regimetry

Unsupervised regime detection for financial time series using embeddings and...

31
Emerging
10 amazon-science/supervised-intent-clustering

This is a package to fine-tune language models in order to create...

29
Experimental
11 demegire/eksi-cluster

Tool for clustering homonymous eksisozluk.com page entries

26
Experimental
12 dcarpintero/taxonomy-completion

Taxonomy Completion with Embedding Quantization and an LLM-based Pipeline: A...

26
Experimental
13 houshuang/limbic

Embedding, search, novelty detection, and clustering for knowledge-dense...

23
Experimental
14 VincentGaoHJ/Taxonomic-Relation-Identification

Awesome research paper on taxonomy (information retrieval). Study notes...

23
Experimental
15 molgenis/variable-taxon-mapper

A tool for mapping elements to a (biomedical) taxonomy

21
Experimental
16 manickbhan/content-pruning-by-semantic-distance-topical-dilution

Visualize Page Embeddings for all Nodes on a Website

20
Experimental
17 uhh-lt/Taxonomy_Refinement_Embeddings

Taxonomy refinement method to improve domain-specific taxonomy systems.

20
Experimental
18 Baho73/cluster-optimization

Text embedding clustering pipeline: outlier detection (KNN + LOF +...

19
Experimental
19 FabienCadoret/autokluster

Auto-k spectral clustering for text embeddings

19
Experimental
20 jacobmarks/clustering-plugin

Compute clustering on your data in a visual, intuitive way with FiftyOne and Sklearn!

18
Experimental
21 duanyu/embedding_application

Some applications of text embedding model, e.g., semantic retrieval and clustering.

16
Experimental
22 NoYo25/ClusteringTableHeaders

This project aims at creating an RDF schema given a list of column headers...

15
Experimental
23 ankaba-x00/ml-anomdetect

Anomaly Detection on Network Traffic Data

15
Experimental
24 tes69ducker/Image-Clustering-ML

🌟 Explore unsupervised image clustering with dynamic K-Means and Cosine...

14
Experimental
25 sahandv/science_science

A framework to analyze, visualize abd predict scientific trends

12
Experimental
26 VieVie31/TAL_synonymy

trying some stuffs about synonymy and other NLP stuffs...

12
Experimental
27 esantus/Outlier_Detection

Data and code for the experiments in the Outlier Detection task proposed by...

12
Experimental
28 amazon-science/frictional-utterances-clustering

This is a package to apply clustering algorithms to utterances, embedded...

11
Experimental
29 haschka/semantic-trees

A repository for collaboration on semantic-trees

11
Experimental
30 sergeyklay/clusterium

Text Clustering Toolkit for Bayesian Nonparametric Analysis

11
Experimental
31 panos-span/rogets_thesaurus

Semantic clustering and classification of Roget's Thesaurus words

11
Experimental
32 Shiv33ndu/msgvault_exploration

Semantic grouping of archived emails built on top of the local email archive...

11
Experimental
33 Marta-Barea/embeddings-clustering-songs-lyrics

Analyze and group song lyrics by semantic meaning using machine learning techniques.

10
Experimental
34 emrecncelik/weighted-bert

Nonofficial implementation of the paper A Text Document Clustering Method...

10
Experimental
35 marsidmali/Roget-s-Thesaurus-in-the-21st-Century

An investigation into how modern machine learning techniques align with...

10
Experimental
36 RubenBroekx/SemiSupervisedClustering

Cluster context-less embedded language data in a semi-supervised manner.

10
Experimental