ddangelov/Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.

65
/ 100
Established

Combines Doc2Vec, BERT Sentence Transformers, or Universal Sentence Encoder embeddings with UMAP dimensionality reduction and HDBSCAN clustering to automatically discover topics without predefined counts or stop word lists. The contextual variant uses token-level embeddings to identify multiple topics per document and intra-document topic spans, exposing results through methods for topic distribution, relevance scoring, and token-level topic assignments.

3,109 stars and 5,399 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 0 / 25
Adoption 19 / 25
Maturity 25 / 25
Community 21 / 25

How are scores calculated?

Stars

3,109

Forks

377

Language

Python

License

BSD-3-Clause

Last pushed

Nov 14, 2024

Monthly downloads

5,399

Commits (30d)

0

Dependencies

9

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/ddangelov/Top2Vec"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.