aerdem4/kaggle-quora-dup

Solution to Kaggle's Quora Duplicate Question Detection Competition

48
/ 100
Emerging

Employs a lightweight LSTM architecture with shared weights and order-invariant feature engineering (min/max frequency, commutative operations) to prevent overfitting on question pairs. Combines 25 engineered features—including NLP metrics, rare word statistics, and LSTM embeddings from GloVe vectors—with 10-fold cross-validation and ensemble averaging, followed by test-set calibration to adjust for class imbalance drift. Built in Keras/TensorFlow with GPU support and includes preprocessing normalization and rare-word replacement to improve generalization.

137 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

137

Forks

51

Language

Python

License

MIT

Last pushed

Feb 18, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/aerdem4/kaggle-quora-dup"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.