shibing624/text2vec
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
Supports multi-GPU/multi-CPU batch inference via multiprocessing and includes a command-line interface for scripting bulk text vectorization tasks. Built on PyTorch with implementations of contrastive learning methods (CoSENT's ranking-aware loss, BGE's RetroMAE pretraining with contrastive finetuning) that optimize for semantic matching; includes pre-trained checkpoints on HuggingFace for Chinese, multilingual, and cross-lingual tasks. Integrates with BERT-family models and sentence-transformers architectures, with tooling for supervised fine-tuning on custom NLI and STS datasets.
4,950 stars and 1,922 monthly downloads. Used by 1 other package. Available on PyPI.
Stars
4,950
Forks
428
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 14, 2026
Monthly downloads
1,922
Commits (30d)
0
Dependencies
7
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/shibing624/text2vec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
ddangelov/Top2Vec
Top2Vec learns jointly embedded topic, document and word vectors.
predict-idlab/pyRDF2Vec
🐍 Python Implementation and Extension of RDF2Vec
IITH-Compilers/IR2Vec
Implementation of IR2Vec, LLVM IR Based Scalable Program Embeddings
IntuitionEngineeringTeam/chars2vec
Character-based word embeddings model based on RNN for handling real world texts
stephantul/reach
Load embeddings and featurize your sentences.