NoYo25/ClusteringTableHeaders

This project aims at creating an RDF schema given a list of column headers of a tabular dataset. It first transforms the given header list into meaningful vectors, then it applies a distance-based Clustering algorithm such that it maximizes the similarity among headers inside one cluster. The user has the facility to move items from one cluster to another and merge among some clusters. The system can suggest cluster names based on the commonality among its members. If no common word found, it will produce Unknown. Afterwards, the user can rename the automatically generated names. Finally, it can expose the resultant clusters in an RDF format.

/ 100

Experimental

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 2 / 25

Maturity 1 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

TorchDR/TorchDR

TorchDR - PyTorch Dimensionality Reduction

derrickburns/generalized-kmeans-clustering

Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL,...

abhilash1910/ClusterTransformer

Topic clustering library built on Transformer embeddings and cosine similarity...

md-experiments/picture_text

Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)

nlpub/watset-java

An implementation of the Watset clustering algorithm in Java.

Explore Embedding Tools

All categories Trending Embeddings directory Insights