IlyaGusev/tgcontest
Telegram Data Clustering contest solution by Mindful Squirrel
Implements a news clustering and categorization pipeline combining FastText embeddings with PyTorch-based triplet-loss sentence encoders for multilingual (Russian/English) document similarity. The C++ backend integrates pre-trained models for language detection, category classification via fastText, and PageRank-based source ranking to aggregate related articles from Telegram channels into topic clusters. Includes Jupyter training notebooks for custom embedding models and provides both command-line tools and interactive web demos for exploring clustered news datasets.
No commits in the last 6 months.
Stars
94
Forks
24
Language
HTML
License
Apache-2.0
Category
Last pushed
Jun 12, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/IlyaGusev/tgcontest"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
philsaurabh/Predict-Something-ML-Prediction-App
This repository is an effort to deploy multiple Machine Learning Applications for production.
snoop2head/yonsei-exchange-program
✈️ MAU 200+ Website | Student Exchange Program Analysis
youheekil/project_europa
A final project to research the European Central bank's green economy supports by using NLP
huyle93/hjh-capstone
A Data Science Project sponsored and proposed by the Liberty Mutual Insurance's Enterprise...
Prati5/WeatherPrediction
Weather prediction using python.