rishabhmisra/News-Headlines-Dataset-For-Sarcasm-Detection
High quality dataset for the task of Sarcasm Detection
Contains 28,619 professionally-written news headlines (13,635 sarcastic from *The Onion*, 14,984 non-sarcastic from *HuffPost*) with self-contained, noise-free labels and 23.35% out-of-vocabulary rate for word2vec embeddings. Addresses Twitter dataset limitations by using formal news text without spelling errors or contextual dependencies, enabling more reliable sarcasm detection model training. Data is distributed as JSONL with headline text, sarcasm labels, and source article links for supplementary data collection.
No commits in the last 6 months.
Stars
95
Forks
46
Language
—
License
—
Category
Last pushed
Feb 18, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/rishabhmisra/News-Headlines-Dataset-For-Sarcasm-Detection"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Hironsan/HateSonar
Hate Speech Detection Library for Python.
t-davidson/hate-speech-and-offensive-language
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive...
franciellevargas/HateBR
HateBR is the first large-scale expert annotated dataset of Brazilian Instagram comments for...
b4k0/CBDA
Cyber Bullying Detection Application (CBDA)
raklugrin01/Disaster-Tweets-Analysis-and-Classification
Analysing Disaster related tweets dataset and build a classifier using deep learning and deploy...