Embedding Clustering Tools
Tools for clustering and organizing data (text, URLs, tables, time series) using embeddings and unsupervised/semi-supervised algorithms. Includes dimensionality reduction and clustering visualization. Does NOT include general semantic search, similarity matching, or domain-specific applications (recommendation systems, RAG, etc.).
There are 36 embedding clustering tools tracked. 1 score above 50 (established tier). The highest-rated is TorchDR/TorchDR at 59/100 with 199 stars and 2,234 monthly downloads.
Get all 36 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-clustering-tools&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
TorchDR/TorchDR
TorchDR - PyTorch Dimensionality Reduction |
|
Established |
| 2 |
derrickburns/generalized-kmeans-clustering
Production-ready K-Means clustering for Apache Spark with pluggable Bregman... |
|
Emerging |
| 3 |
abhilash1910/ClusterTransformer
Topic clustering library built on Transformer embeddings and cosine... |
|
Emerging |
| 4 |
md-experiments/picture_text
Interactive tree-maps with SBERT & Hierarchical Clustering (HAC) |
|
Emerging |
| 5 |
mainlp/semantic_components
Finding semantic components in your neural representations. |
|
Emerging |
| 6 |
scientist-labs/clusterkit
High-performance UMAP dimensionality reduction for Ruby, powered by the... |
|
Emerging |
| 7 |
nlpub/watset-java
An implementation of the Watset clustering algorithm in Java. |
|
Emerging |
| 8 |
abojchevski/rsc
Robust Spectral Clustering. Implementation of "Robust Spectral Clustering... |
|
Emerging |
| 9 |
kjpou1/regimetry
Unsupervised regime detection for financial time series using embeddings and... |
|
Emerging |
| 10 |
amazon-science/supervised-intent-clustering
This is a package to fine-tune language models in order to create... |
|
Experimental |
| 11 |
demegire/eksi-cluster
Tool for clustering homonymous eksisozluk.com page entries |
|
Experimental |
| 12 |
dcarpintero/taxonomy-completion
Taxonomy Completion with Embedding Quantization and an LLM-based Pipeline: A... |
|
Experimental |
| 13 |
houshuang/limbic
Embedding, search, novelty detection, and clustering for knowledge-dense... |
|
Experimental |
| 14 |
VincentGaoHJ/Taxonomic-Relation-Identification
Awesome research paper on taxonomy (information retrieval). Study notes... |
|
Experimental |
| 15 |
molgenis/variable-taxon-mapper
A tool for mapping elements to a (biomedical) taxonomy |
|
Experimental |
| 16 |
manickbhan/content-pruning-by-semantic-distance-topical-dilution
Visualize Page Embeddings for all Nodes on a Website |
|
Experimental |
| 17 |
uhh-lt/Taxonomy_Refinement_Embeddings
Taxonomy refinement method to improve domain-specific taxonomy systems. |
|
Experimental |
| 18 |
Baho73/cluster-optimization
Text embedding clustering pipeline: outlier detection (KNN + LOF +... |
|
Experimental |
| 19 |
FabienCadoret/autokluster
Auto-k spectral clustering for text embeddings |
|
Experimental |
| 20 |
jacobmarks/clustering-plugin
Compute clustering on your data in a visual, intuitive way with FiftyOne and Sklearn! |
|
Experimental |
| 21 |
duanyu/embedding_application
Some applications of text embedding model, e.g., semantic retrieval and clustering. |
|
Experimental |
| 22 |
NoYo25/ClusteringTableHeaders
This project aims at creating an RDF schema given a list of column headers... |
|
Experimental |
| 23 |
ankaba-x00/ml-anomdetect
Anomaly Detection on Network Traffic Data |
|
Experimental |
| 24 |
tes69ducker/Image-Clustering-ML
🌟 Explore unsupervised image clustering with dynamic K-Means and Cosine... |
|
Experimental |
| 25 |
sahandv/science_science
A framework to analyze, visualize abd predict scientific trends |
|
Experimental |
| 26 |
VieVie31/TAL_synonymy
trying some stuffs about synonymy and other NLP stuffs... |
|
Experimental |
| 27 |
esantus/Outlier_Detection
Data and code for the experiments in the Outlier Detection task proposed by... |
|
Experimental |
| 28 |
amazon-science/frictional-utterances-clustering
This is a package to apply clustering algorithms to utterances, embedded... |
|
Experimental |
| 29 |
haschka/semantic-trees
A repository for collaboration on semantic-trees |
|
Experimental |
| 30 |
sergeyklay/clusterium
Text Clustering Toolkit for Bayesian Nonparametric Analysis |
|
Experimental |
| 31 |
panos-span/rogets_thesaurus
Semantic clustering and classification of Roget's Thesaurus words |
|
Experimental |
| 32 |
Shiv33ndu/msgvault_exploration
Semantic grouping of archived emails built on top of the local email archive... |
|
Experimental |
| 33 |
Marta-Barea/embeddings-clustering-songs-lyrics
Analyze and group song lyrics by semantic meaning using machine learning techniques. |
|
Experimental |
| 34 |
emrecncelik/weighted-bert
Nonofficial implementation of the paper A Text Document Clustering Method... |
|
Experimental |
| 35 |
marsidmali/Roget-s-Thesaurus-in-the-21st-Century
An investigation into how modern machine learning techniques align with... |
|
Experimental |
| 36 |
RubenBroekx/SemiSupervisedClustering
Cluster context-less embedded language data in a semi-supervised manner. |
|
Experimental |