dselivanov/text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

/ 100

Established

Implements core NLP tasks through C++ with OpenMP parallelization, enabling near-linear multicore scaling and stream-based processing to handle data larger than RAM. Provides unified APIs across vectorization, topic modeling (LSA/LDA), distance metrics, and GloVe embeddings. Supports fork-based parallel backends on Unix systems for embarrassingly parallel operations like document-term matrix construction.

870 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 23 / 25

How are scores calculated?

Stars

870

Forks

134

Language

License

—

Related tools

avidale/compress-fasttext

Tools for shrinking fastText models (in gensim format)

vzhong/embeddings

Fast, DB Backed pretrained word embeddings for natural language processing.

dccuchile/spanish-word-embeddings

Spanish word embeddings computed with different methods and from different corpora

ncbi-nlp/BioSentVec

BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences

ibrahimsharaf/doc2vec

:notebook: Long(er) text representation and classification using Doc2Vec embeddings

Explore NLP Tools

All categories Trending NLP directory Insights