babylonhealth/fastText_multilingual
Multilingual word vectors in 78 languages
Provides pre-computed SVD-based alignment matrices that project fastText's monolingual word vectors from 78 languages into a shared vector space, enabling cross-lingual similarity and translation prediction via nearest-neighbor lookup. Each language matrix is learned by aligning against English using bilingual dictionaries derived from Google Translate, achieving ~73% precision@1 for translation retrieval while preserving original monolingual relationships. The approach requires only applying a linear transformation to existing fastText vectors—no retraining needed.
1,202 stars. No commits in the last 6 months.
Stars
1,202
Forks
120
Language
Jupyter Notebook
License
BSD-3-Clause
Category
Last pushed
Mar 10, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/babylonhealth/fastText_multilingual"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
shibing624/similarities
Similarities: a toolkit for similarity calculation and semantic search....
explosion/sense2vec
🦆 Contextually-keyed word vectors
chakki-works/chakin
Simple downloader for pre-trained word vectors
pdrm83/sent2vec
How to encode sentences in a high-dimensional vector space, a.k.a., sentence embedding.
sebischair/Lbl2Vec
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with...