lgalke/text-clf-baselines
WideMLP for Text Classification
Implements comparative benchmarks across Bag-of-Words MLPs, graph neural networks (TextGCN, HeteGCN), and Transformer models (BERT, DistilBERT) on standard text classification datasets. The wide MLP approach uses GloVe embeddings with a simple dense architecture, demonstrating competitive or superior performance to graph-based methods while maintaining significantly lower computational overhead—avoiding the O(N²) graph construction and O(L²) attention computations. Includes modular implementations of tokenization, data loading for five benchmark datasets (20ng, R8, R52, OHSUMED, MR), and reproducible experiment scripts from the ACL 2022 paper.
No commits in the last 6 months.
Stars
29
Forks
5
Language
Python
License
MIT
Category
Last pushed
Aug 10, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/lgalke/text-clf-baselines"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
urchade/GLiNER
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from...
HySonLab/ViDeBERTa
ViDeBERTa: A powerful pre-trained language model for Vietnamese, EACL 2023
acampillos/social-media-nlp
Sentiment analysis with pre-trained language models using TweetEval.
JamesLYC88/text_classification_baseline_code
The code for the ACL 2023 paper "Linear Classifier: An Often-Forgotten Baseline for Text Classification".