lgalke/text-clf-baselines

WideMLP for Text Classification

/ 100

Emerging

Implements comparative benchmarks across Bag-of-Words MLPs, graph neural networks (TextGCN, HeteGCN), and Transformer models (BERT, DistilBERT) on standard text classification datasets. The wide MLP approach uses GloVe embeddings with a simple dense architecture, demonstrating competitive or superior performance to graph-based methods while maintaining significantly lower computational overhead—avoiding the O(N²) graph construction and O(L²) attention computations. Includes modular implementations of tokenization, data loading for five benchmark datasets (20ng, R8, R52, OHSUMED, MR), and reproducible experiment scripts from the ACL 2022 paper.

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 9 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

urchade/GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from...

HySonLab/ViDeBERTa

ViDeBERTa: A powerful pre-trained language model for Vietnamese, EACL 2023

acampillos/social-media-nlp

Sentiment analysis with pre-trained language models using TweetEval.

JamesLYC88/text_classification_baseline_code

The code for the ACL 2023 paper "Linear Classifier: An Often-Forgotten Baseline for Text Classification".

Explore NLP Tools

All categories Trending NLP directory Insights