aneessaheba/hadoop-news-analytics
Distributed word frequency analysis on 5,000 HuffPost news headlines using Apache Hadoop MapReduce and mrjob. Single-node cluster on Docker with HDFS and YARN configured from scratch. Top 50 keywords extracted via a 2-step MapReduce pipeline with NLTK stopword filtering.
Stars
1
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/aneessaheba/hadoop-news-analytics"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
textvec/textvec
Text vectorization tool to outperform TFIDF for classification tasks
DigitalPebble/behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
nasa-jpl-memex/memex-gate
General Architecture for Text Engineering
NISH1001/tag-generator
A simple tool to generate tags for the given text (document) using TF-IDF.
cooperability/BMX-bookmark-extractor
Better brain. Knowledge management tool. Stop saving things you'll never read. Work in progress.