MilaNLProc/contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

/ 100

Established

# Technical Summary Implements two complementary neural architectures—CombinedTM integrates contextualized embeddings with bag-of-words reconstruction via a VAE-like framework, while ZeroShotTM operates on embeddings alone for cross-lingual and zero-shot capabilities. Leverages Sentence-BERT for flexible embedding generation across any HuggingFace model, with careful preprocessing workflows to manage vocabulary size (≤2000 terms recommended) and balance between preprocessed BoW and raw text for embeddings. Includes Kitty, a human-in-the-loop classifier submodule for interactive document clustering and annotation workflows.

1,266 stars and 1,326 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 2 / 25

Adoption 17 / 25

Maturity 18 / 25

Community 21 / 25

How are scores calculated?

Stars

1,266

Forks

150

Language

Python

License

MIT

Related tools

vinid/cade

Compass-aligned Distributional Embeddings. Align embeddings from different corpora

ina-foss/twembeddings

Sentence embeddings for unsupervised event detection in the Twitter stream: study on English and...

criteo-research/CausE

Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.

spcl/ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics

vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support...

Explore Embedding Tools

All categories Trending Embeddings directory Insights