dselivanov/text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Implements core NLP tasks through C++ with OpenMP parallelization, enabling near-linear multicore scaling and stream-based processing to handle data larger than RAM. Provides unified APIs across vectorization, topic modeling (LSA/LDA), distance metrics, and GloVe embeddings. Supports fork-based parallel backends on Unix systems for embarrassingly parallel operations like document-term matrix construction.
870 stars.
Stars
870
Forks
134
Language
R
License
—
Category
Last pushed
Dec 01, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/dselivanov/text2vec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
avidale/compress-fasttext
Tools for shrinking fastText models (in gensim format)
vzhong/embeddings
Fast, DB Backed pretrained word embeddings for natural language processing.
dccuchile/spanish-word-embeddings
Spanish word embeddings computed with different methods and from different corpora
ncbi-nlp/BioSentVec
BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
ibrahimsharaf/doc2vec
:notebook: Long(er) text representation and classification using Doc2Vec embeddings