IlyaGusev/tgcontest

Telegram Data Clustering contest solution by Mindful Squirrel

45
/ 100
Emerging

Implements a news clustering and categorization pipeline combining FastText embeddings with PyTorch-based triplet-loss sentence encoders for multilingual (Russian/English) document similarity. The C++ backend integrates pre-trained models for language detection, category classification via fastText, and PageRank-based source ranking to aggregate related articles from Telegram channels into topic clusters. Includes Jupyter training notebooks for custom embedding models and provides both command-line tools and interactive web demos for exploring clustered news datasets.

No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

94

Forks

24

Language

HTML

License

Apache-2.0

Last pushed

Jun 12, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/IlyaGusev/tgcontest"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.