text2vec vs Top2Vec — 73 vs 65 Quality Score

text2vec

73

Verified

Top2Vec

65

Established

Maintenance 10/25

Adoption 19/25

Maturity 25/25

Community 19/25

Maintenance 0/25

Adoption 19/25

Maturity 25/25

Community 21/25

Stars: 4,950

Forks: 428

Downloads: 1,922

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 3,109

Forks: 377

Downloads: 5,399

Commits (30d): 0

Language: Python

License: BSD-3-Clause

No risk flags

Stale 6m

About text2vec

shibing624/text2vec

text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。

Supports multi-GPU/multi-CPU batch inference via multiprocessing and includes a command-line interface for scripting bulk text vectorization tasks. Built on PyTorch with implementations of contrastive learning methods (CoSENT's ranking-aware loss, BGE's RetroMAE pretraining with contrastive finetuning) that optimize for semantic matching; includes pre-trained checkpoints on HuggingFace for Chinese, multilingual, and cross-lingual tasks. Integrates with BERT-family models and sentence-transformers architectures, with tooling for supervised fine-tuning on custom NLI and STS datasets.

About Top2Vec

ddangelov/Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.

Combines Doc2Vec, BERT Sentence Transformers, or Universal Sentence Encoder embeddings with UMAP dimensionality reduction and HDBSCAN clustering to automatically discover topics without predefined counts or stop word lists. The contextual variant uses token-level embeddings to identify multiple topics per document and intra-document topic spans, exposing results through methods for topic distribution, relevance scoring, and token-level topic assignments.

text2vec and Top2Vec

About text2vec

About Top2Vec

Related comparisons