federicoarenasl/Evaluating-w-Embeddings

In this paper we compare and evaluate two simple embedding models which can be constructed directly from a given co-occurrence matrix extracted from Twitter data; Positive Pointwise Mutual Information (PPMI), and Hellinger Principal Component Analysis (H-PCA). For each embedding model we consider three alternative metrics for word similarity: cosine, euclidean and manhattan distance.

/ 100

Experimental

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 3 / 25

Maturity 1 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

avidale/compress-fasttext

Tools for shrinking fastText models (in gensim format)

dselivanov/text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

vzhong/embeddings

Fast, DB Backed pretrained word embeddings for natural language processing.

dccuchile/spanish-word-embeddings

Spanish word embeddings computed with different methods and from different corpora

ncbi-nlp/BioSentVec

BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences

Explore NLP Tools

All categories Trending NLP directory Insights