aerdem4/kaggle-quora-dup
Solution to Kaggle's Quora Duplicate Question Detection Competition
Employs a lightweight LSTM architecture with shared weights and order-invariant feature engineering (min/max frequency, commutative operations) to prevent overfitting on question pairs. Combines 25 engineered features—including NLP metrics, rare word statistics, and LSTM embeddings from GloVe vectors—with 10-fold cross-validation and ensemble averaging, followed by test-set calibration to adjust for class imbalance drift. Built in Keras/TensorFlow with GPU support and includes preprocessing normalization and rare-word replacement to improve generalization.
137 stars. No commits in the last 6 months.
Stars
137
Forks
51
Language
Python
License
MIT
Category
Last pushed
Feb 18, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/aerdem4/kaggle-quora-dup"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
LittletreeZou/Question-Pairs-Matching
第三届魔镜杯 智能客服问题相似性算法设计 第12名解决方案
howardyclo/Kaggle-Quora-Question-Pairs
This is our team's solution report, which achieves top 10% (305/3307) in this competition.
aspk/Quora_question_pairs_NLP_Kaggle
Quora Kaggle Competition : Natural Language Processing using word2vec embeddings, scikit-learn...
AsadiAhmad/Detect-Duplicated-Questions
Detect Duplicated StackOverFlow Questions
goodmattg/quora_kaggle
Storage for Kaggle Quora competition