ina-foss/twembeddings

Sentence embeddings for unsupervised event detection in the Twitter stream: study on English and French corpora

50
/ 100
Established

Implements multiple sentence embedding models (TF-IDF, Word2Vec, ELMo, BERT, Universal Sentence Encoder, Sentence-BERT) for unsupervised First Story Detection clustering on Twitter streams. The toolkit performs temporal event detection by computing similarity thresholds between tweet embeddings and evaluates performance across English and French corpora using standard NLP frameworks (scikit-learn, transformers, TensorFlow). Supports custom datasets via TSV input with optional ground-truth labels and outputs clustering predictions with detailed evaluation metrics.

No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 2 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 13 / 25

How are scores calculated?

Stars

33

Forks

5

Language

Jupyter Notebook

License

MIT

Last pushed

Jul 25, 2025

Monthly downloads

32

Commits (30d)

0

Dependencies

15

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/ina-foss/twembeddings"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.