shibing624/text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

73
/ 100
Verified

Supports multi-GPU/multi-CPU batch inference via multiprocessing and includes a command-line interface for scripting bulk text vectorization tasks. Built on PyTorch with implementations of contrastive learning methods (CoSENT's ranking-aware loss, BGE's RetroMAE pretraining with contrastive finetuning) that optimize for semantic matching; includes pre-trained checkpoints on HuggingFace for Chinese, multilingual, and cross-lingual tasks. Integrates with BERT-family models and sentence-transformers architectures, with tooling for supervised fine-tuning on custom NLI and STS datasets.

4,950 stars and 1,922 monthly downloads. Used by 1 other package. Available on PyPI.

Maintenance 10 / 25
Adoption 19 / 25
Maturity 25 / 25
Community 19 / 25

How are scores calculated?

Stars

4,950

Forks

428

Language

Python

License

Apache-2.0

Last pushed

Feb 14, 2026

Monthly downloads

1,922

Commits (30d)

0

Dependencies

7

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/shibing624/text2vec"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.