natasha/navec
Compact high quality word embeddings for Russian language
Trained on large Russian corpora (12B+ tokens) using vanilla GloVe with vector quantization (300d→100d), achieving competitive intrinsic evaluation scores while reducing disk footprint to ~50MB and load time to ~1 second. Integrates directly with PyTorch via the Slovnet library's `NavecEmbedding` module and supports dictionary-like access patterns for both word lookup and vocabulary indexing with special handling for unknown and padding tokens.
216 stars. Used by 2 other packages. No commits in the last 6 months. Available on PyPI.
Stars
216
Forks
20
Language
Python
License
MIT
Category
Last pushed
Jul 24, 2023
Commits (30d)
0
Dependencies
1
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/natasha/navec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
shibing624/text2vec
text2vec, text to vector....
ddangelov/Top2Vec
Top2Vec learns jointly embedded topic, document and word vectors.
predict-idlab/pyRDF2Vec
🐍 Python Implementation and Extension of RDF2Vec
IntuitionEngineeringTeam/chars2vec
Character-based word embeddings model based on RNN for handling real world texts
IITH-Compilers/IR2Vec
Implementation of IR2Vec, LLVM IR Based Scalable Program Embeddings