text2vec and doc2vec

text2vec
73
Verified
doc2vec
38
Emerging
Maintenance 10/25
Adoption 19/25
Maturity 25/25
Community 19/25
Maintenance 6/25
Adoption 8/25
Maturity 9/25
Community 15/25
Stars: 4,950
Forks: 428
Downloads: 1,922
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 51
Forks: 8
Downloads:
Commits (30d): 0
Language: C++
License:
No risk flags
No Package No Dependents

About text2vec

shibing624/text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

Supports multi-GPU/multi-CPU batch inference via multiprocessing and includes a command-line interface for scripting bulk text vectorization tasks. Built on PyTorch with implementations of contrastive learning methods (CoSENT's ranking-aware loss, BGE's RetroMAE pretraining with contrastive finetuning) that optimize for semantic matching; includes pre-trained checkpoints on HuggingFace for Chinese, multilingual, and cross-lingual tasks. Integrates with BERT-family models and sentence-transformers architectures, with tooling for supervised fine-tuning on custom NLI and STS datasets.

About doc2vec

bnosac/doc2vec

Distributed Representations of Sentences and Documents

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work