ina-foss/twembeddings
Sentence embeddings for unsupervised event detection in the Twitter stream: study on English and French corpora
Implements multiple sentence embedding models (TF-IDF, Word2Vec, ELMo, BERT, Universal Sentence Encoder, Sentence-BERT) for unsupervised First Story Detection clustering on Twitter streams. The toolkit performs temporal event detection by computing similarity thresholds between tweet embeddings and evaluates performance across English and French corpora using standard NLP frameworks (scikit-learn, transformers, TensorFlow). Supports custom datasets via TSV input with optional ground-truth labels and outputs clustering predictions with detailed evaluation metrics.
No commits in the last 6 months. Available on PyPI.
Stars
33
Forks
5
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jul 25, 2025
Monthly downloads
32
Commits (30d)
0
Dependencies
15
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/ina-foss/twembeddings"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
MilaNLProc/contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings...
vinid/cade
Compass-aligned Distributional Embeddings. Align embeddings from different corpora
criteo-research/CausE
Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.
spcl/ncc
Neural Code Comprehension: A Learnable Representation of Code Semantics
vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support...