uhh-lt/sensegram
Making sense embedding out of word embeddings using graph-based word sense induction
Induces polysemous word senses by clustering ego-networks extracted from word embeddings, then generates sense-specific vectors disambiguated across contexts. Works with pretrained embeddings (word2vec format) or raw text corpora via gensim, using FAISS for similarity graphs and Chinese Whispers for clustering. Outputs sense inventories with probability distributions and supports optional hypernymy labeling for proto-conceptualization resources.
213 stars. No commits in the last 6 months.
Stars
213
Forks
52
Language
Python
License
—
Category
Last pushed
May 17, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/uhh-lt/sensegram"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MilaNLProc/contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings...
vinid/cade
Compass-aligned Distributional Embeddings. Align embeddings from different corpora
ina-foss/twembeddings
Sentence embeddings for unsupervised event detection in the Twitter stream: study on English and...
criteo-research/CausE
Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.
spcl/ncc
Neural Code Comprehension: A Learnable Representation of Code Semantics