rishabhmisra/News-Headlines-Dataset-For-Sarcasm-Detection

High quality dataset for the task of Sarcasm Detection

32
/ 100
Emerging

Contains 28,619 professionally-written news headlines (13,635 sarcastic from *The Onion*, 14,984 non-sarcastic from *HuffPost*) with self-contained, noise-free labels and 23.35% out-of-vocabulary rate for word2vec embeddings. Addresses Twitter dataset limitations by using formal news text without spelling errors or contextual dependencies, enabling more reliable sarcasm detection model training. Data is distributed as JSONL with headline text, sarcasm labels, and source article links for supplementary data collection.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 1 / 25
Community 22 / 25

How are scores calculated?

Stars

95

Forks

46

Language

License

Last pushed

Feb 18, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/rishabhmisra/News-Headlines-Dataset-For-Sarcasm-Detection"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.