ThoughtRiver/lmdb-embeddings

Fast word vectors with little memory usage in Python

/ 100

Emerging

Leverages Lightning Memory-Mapped Database (LMDB) to enable zero-load-time access to pre-trained embeddings with negligible memory overhead—large models like GloVe-840B require only a few MB versus 4GB traditionally. Supports pluggable serialization backends (pickle, msgpack) and includes an LRU cache for frequently accessed vectors, with transparent compatibility for gensim models and custom embedding iterators.

416 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

416

Forks

Language

Python

License

GPL-3.0

Category

word-embedding-implementations

Last pushed

Jun 26, 2021

Commits (30d)

GitHub

Word Embedding Implementations · 109 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/ThoughtRiver/lmdb-embeddings"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

shibing624/text2vec

text2vec, text to vector....

ddangelov/Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.

predict-idlab/pyRDF2Vec

🐍 Python Implementation and Extension of RDF2Vec

IntuitionEngineeringTeam/chars2vec

Character-based word embeddings model based on RNN for handling real world texts

IITH-Compilers/IR2Vec

Implementation of IR2Vec, LLVM IR Based Scalable Program Embeddings

Explore Embedding Tools

All categories Trending Embeddings directory Insights