NoYo25/ClusteringTableHeaders
This project aims at creating an RDF schema given a list of column headers of a tabular dataset. It first transforms the given header list into meaningful vectors, then it applies a distance-based Clustering algorithm such that it maximizes the similarity among headers inside one cluster. The user has the facility to move items from one cluster to another and merge among some clusters. The system can suggest cluster names based on the commonality among its members. If no common word found, it will produce Unknown. Afterwards, the user can rename the automatically generated names. Finally, it can expose the resultant clusters in an RDF format.
No commits in the last 6 months.
Stars
2
Forks
1
Language
Python
License
—
Category
Last pushed
Feb 23, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/NoYo25/ClusteringTableHeaders"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TorchDR/TorchDR
TorchDR - PyTorch Dimensionality Reduction
derrickburns/generalized-kmeans-clustering
Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL,...
abhilash1910/ClusterTransformer
Topic clustering library built on Transformer embeddings and cosine similarity...
md-experiments/picture_text
Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)
nlpub/watset-java
An implementation of the Watset clustering algorithm in Java.