MaartenGr/BERTopic
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Combines dense transformer embeddings with dimensionality reduction (UMAP) and clustering (HDBSCAN) to discover coherent topics, then applies class-based TF-IDF to extract semantically meaningful keywords per cluster. Supports diverse modeling paradigms including supervised, hierarchical, dynamic, multimodal, and zero-shot approaches, with optional LLM-based topic representation for natural language summaries. Integrates with 🤗 Hugging Face transformers and offers pluggable backends for embeddings (Flair, spaCy, Gensim) and vision models for cross-modal topic discovery.
7,443 stars. Used by 5 other packages. Available on PyPI.
Stars
7,443
Forks
882
Language
Python
License
MIT
Category
Last pushed
Feb 20, 2026
Commits (30d)
0
Dependencies
9
Reverse dependents
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/MaartenGr/BERTopic"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
webis-de/small-text
Active Learning for Text Classification in Python
mead-ml/mead-baseline
Deep-Learning Model Exploration and Development for NLP
x-tabdeveloping/turftopic
Robust and fast topic models with sentence-transformers.
HumanSignal/label-studio-transformers
Label data using HuggingFace's transformers and automatically get a prediction service
hiyouga/Dual-Contrastive-Learning
Code for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation"