ThoughtRiver/lmdb-embeddings
Fast word vectors with little memory usage in Python
Leverages Lightning Memory-Mapped Database (LMDB) to enable zero-load-time access to pre-trained embeddings with negligible memory overheadâlarge models like GloVe-840B require only a few MB versus 4GB traditionally. Supports pluggable serialization backends (pickle, msgpack) and includes an LRU cache for frequently accessed vectors, with transparent compatibility for gensim models and custom embedding iterators.
416 stars. No commits in the last 6 months.
Stars
416
Forks
31
Language
Python
License
GPL-3.0
Category
Last pushed
Jun 26, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/ThoughtRiver/lmdb-embeddings"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
shibing624/text2vec
text2vec, text to vector....
ddangelov/Top2Vec
Top2Vec learns jointly embedded topic, document and word vectors.
predict-idlab/pyRDF2Vec
đ Python Implementation and Extension of RDF2Vec
IntuitionEngineeringTeam/chars2vec
Character-based word embeddings model based on RNN for handling real world texts
IITH-Compilers/IR2Vec
Implementation of IR2Vec, LLVM IR Based Scalable Program Embeddings