Web Scraping NLP Pipelines NLP Tools
End-to-end systems that combine web scraping with NLP analysis (sentiment, readability, topic modeling, entity extraction) on text extracted from websites, articles, or online sources. Does NOT include standalone scraping tools, NLP libraries, or applications that only perform analysis without web data extraction.
There are 79 web scraping nlp pipelines tools tracked. 1 score above 70 (verified tier). The highest-rated is flairNLP/fundus at 73/100 with 443 stars and 3,566 monthly downloads.
Get all 79 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=web-scraping-nlp-pipelines&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
flairNLP/fundus
A very simple news crawler with a funny name |
|
Verified |
| 2 |
fhamborg/news-please
news-please - an integrated web crawler and information extractor for news... |
|
Established |
| 3 |
affjljoo3581/canrevan
대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다. |
|
Established |
| 4 |
FreeDiscovery/FreeDiscovery
Web Service for E-Discovery Analytics |
|
Established |
| 5 |
Multiverse-of-Projects/NewsAI
A dynamic NewsAI dashboard that uses NLP to analyze news articles, visualize... |
|
Emerging |
| 6 |
tirthajyoti/Web-Database-Analytics
Web scrapping and related analytics using Python tools |
|
Emerging |
| 7 |
smyja/blackmaria
Python package for webscraping in Natural language |
|
Emerging |
| 8 |
MasuRii/FBScrapeIdeas
Modern CLI tool for scraping & analyzing Facebook groups using Playwright &... |
|
Emerging |
| 9 |
rajaswa/DRIFT
DRIFT is a tool for Diachronic Analysis of Scientific Literature. |
|
Emerging |
| 10 |
FinnishCancerRegistry/gleason_extraction_py
Extract Gleason scores from texts. |
|
Emerging |
| 11 |
kevalmorabia97/SEDTWik-Event-Detection-from-Tweets
Segmentation based event detection from Tweets. Published at NAACL SRW 2019 |
|
Emerging |
| 12 |
uhh-lt/newsleak
Information extraction and interactive visualization of textual datasets for... |
|
Emerging |
| 13 |
vipul-sharma20/sharingan
Tool to extract news articles from newspaper and give the context about the news |
|
Emerging |
| 14 |
sandeep-sandhu/NewsLookout
The NewsLookout web scraping application with NLP and data pre-processing |
|
Emerging |
| 15 |
ahmedbesbes/How-to-mine-newsfeed-data-and-extract-interactive-insights-in-Python
A practical guide to topic mining and interactive visualizations |
|
Emerging |
| 16 |
uscensusbureau/SABLE
Scraping Assisted by Learning |
|
Emerging |
| 17 |
Sotera/watchman
Watchman: An open-source social-media event-detection system |
|
Emerging |
| 18 |
Jasiri-App/datagpu
DataGPU is an open-source data compiler for AI pipelines that helps you... |
|
Emerging |
| 19 |
nawaz-kmr/Data_Extraction_and_Text_Analysis_for_Blackcoffer_company.
The objective of this assignment is to extract textual data articles from... |
|
Experimental |
| 20 |
VIDA-NYU/domain_discovery_API
Domain Discovery Operations API formalizes the human domain discovery... |
|
Experimental |
| 21 |
scrapegoat/scrapegoat
Scrape Data in One-shot. |
|
Experimental |
| 22 |
nakuleshj/news-nlp-pipeline
A fully serverless, event-driven data pipeline that ingests, enriches,... |
|
Experimental |
| 23 |
Just-Helpful/preventable-deaths-scraper
Web scraper, written for the Preventable Deaths website, with emphasis on... |
|
Experimental |
| 24 |
networkdynamics/seldonite
A News Article Collection Library |
|
Experimental |
| 25 |
victoria217-bottino/google-news-scraper
# 📰 Google News Scraper A Python tool to fetch, decode, and process... |
|
Experimental |
| 26 |
GateNLP/wpextract
Create datasets from WordPress sites for research or archiving |
|
Experimental |
| 27 |
lkstrp/newspaper-scraper
The all-in-one Python package for seamless newspaper article indexing,... |
|
Experimental |
| 28 |
SakuraPuare/ZhiHu_Spider
知乎内容爬虫 | Web scraper for Zhihu content extraction |
|
Experimental |
| 29 |
gangula-karthik/KAKI-App
A web app uniting everyone for big wins and a greener Singapore! 🚀🌳 |
|
Experimental |
| 30 |
nostoz/news_monitor
Real time news monitor aggregating from various sources based on keywords |
|
Experimental |
| 31 |
ZIADEA/SmartWebScraper-CV
SmartWebScraper-CV – AI-Powered Web Page Zone Detection SmartWebScraper-CV... |
|
Experimental |
| 32 |
BioinfoNet/Data-mining
Data mining to discover trends in Open Science in Kenya |
|
Experimental |
| 33 |
Atharv279/Task-Extraction-NLP
NLP-based Task Extraction & Categorization | This project extracts tasks... |
|
Experimental |
| 34 |
jasp9559/Web-Scraping-of-Indian-Judgements
Web scraping project for scraping the latest/most recent judgement taken on the day |
|
Experimental |
| 35 |
antoninfaure/rssTrends
Finding Topics in French News using RSS Feeds |
|
Experimental |
| 36 |
ntddk/peeling-onions
A repository to store Deep Web (onion domain) crawler, scraper, and NLP... |
|
Experimental |
| 37 |
b-i-king/Top_News_Twitter_Bot_Template
Twitter Bot Template |
|
Experimental |
| 38 |
aybarskerem/WebScraper
This repo contains Various WebScrapers for different sites and process the... |
|
Experimental |
| 39 |
zer0Percent/OhWowBREAKINGNews
A multithreaded scraper to retrieve and parse new's articles. |
|
Experimental |
| 40 |
susannapaoli/web-scraper-nyt
New York Times Scraper |
|
Experimental |
| 41 |
georgiarichards/preventabledeathstracker
Code for running the Preventable Deaths Tracker website |
|
Experimental |
| 42 |
sodalabsio/event-detection-extraction
Repository for QA-based event detection and extraction from news and social media. |
|
Experimental |
| 43 |
bhx98/NameAnalysis
Choosing a company name by analyzing the most used keywords in the field and... |
|
Experimental |
| 44 |
dukeblue1994-glitch/chronicle
Intelligent event detection system using semantic embeddings, MinHash LSH... |
|
Experimental |
| 45 |
Aniket-16-S/Product-Scraper
Scrapping products from well known e-com. sites like Amazon, Flipkart and... |
|
Experimental |
| 46 |
jpwahle/cs-insights-crawler
This repository implements the interaction with DBLP, information extraction... |
|
Experimental |
| 47 |
dobbersc/fundus-evaluation
[ACL 2024] Evaluation of the Fundus News Scraper |
|
Experimental |
| 48 |
Awakumori/NGAspider
NGA论坛(艾泽拉斯国家地理)爬虫工具。采用多线程采集,MongoDB存储,集成PaddlePaddle进行NLP。整合百度解语进行实体识别,更新NLP情... |
|
Experimental |
| 49 |
balaurian/fx_news_scraper
A scraper for investing.com forex news using beautifulsoup and nltk. It also... |
|
Experimental |
| 50 |
agi-templar/MediaCloudDataDownloader
Download full-length articles from media outlets. |
|
Experimental |
| 51 |
WISETICT-PPAM/Data-Analytics
제품 정보 크롤링 및 리뷰 텍스트 마이닝 |
|
Experimental |
| 52 |
stkisengese/news-intelligence-nlp-platform
A Python-based NLP platform for scraping, analyzing, and enriching news... |
|
Experimental |
| 53 |
AmmarRashed/EventOrient
A web-based application for monitoring, analyzing and visualizing social... |
|
Experimental |
| 54 |
Kamomille/WebScrapping_Supermarket
Analyse des coûts des supermarchés |
|
Experimental |
| 55 |
someoneorlov/styx
ML News Analysis Service |
|
Experimental |
| 56 |
samuelhatcliff/newstracker
News Tracker is an application designed to enhance and optimize the way that... |
|
Experimental |
| 57 |
nivaangupta/news-website
A news website that provides summarised news on trending topics, popular... |
|
Experimental |
| 58 |
MANISH007700/NewsArticleExtraction
Extraction of News Article from different News Web Pages using feedparser... |
|
Experimental |
| 59 |
eyereece/nlp-text-mining-dashboard
nlp text mining dashboard to explore current trends and extract most used... |
|
Experimental |
| 60 |
asaifuddin18/Search-Engine-Data-Collector
Summer '21 research project under Forward Data Lab group. Django website... |
|
Experimental |
| 61 |
kshitijbhandari/Web-Scraping-and-text-analysis
NLP pipeline to scrape 114 articles using BeautifulSoup and compute 13... |
|
Experimental |
| 62 |
moehmeni/ezweb
Easy to use web page analyzer |
|
Experimental |
| 63 |
stuartemiddleton/floraguard_crawler
FloraGuard crawler for online forums and marketplaces around the illegal... |
|
Experimental |
| 64 |
satyampandey1411/SAT-News-Analyser
SAT News Analyser is a web application offering in-depth news article... |
|
Experimental |
| 65 |
manthank17-learn/Open-Source-Signal-Intelligence-Early-Anomaly-Detection-Platform
Early anomaly detection platform using news, prediction markets, shipping... |
|
Experimental |
| 66 |
AnonCatalyst/OpenZenith
The OpenZenith Project is a conceptual/ongoing initiative focused on... |
|
Experimental |
| 67 |
umutkavakli/sikayetvar-scraping
A scraping tool for customer complaints of specified brands to use in NLP tasks. |
|
Experimental |
| 68 |
javiermascarena/footy-narratives
Automated weekly storylines and topic summaries for the “Big Six” English... |
|
Experimental |
| 69 |
estefaniagPerez/net-analyzer-sna-nlp-analysis
This project (ReactJS and Python) combines Social Network Analysis (SNA) and... |
|
Experimental |
| 70 |
Haimonmon/snippy
A Book scraping bot that ables to give you books data, but be cautious as... |
|
Experimental |
| 71 |
DRSarcenoR/fetchNews
Aplicación en Streamlit que dado el prompt (se espera un nombre), muestre... |
|
Experimental |
| 72 |
pranjal-pravesh/web-article-analyzer
A comprehensive text analysis system that performs web scraping, sentiment... |
|
Experimental |
| 73 |
utkarsh512/CreateDebateScraper
Scraping debates from the CreateDebate forum |
|
Experimental |
| 74 |
Anonym0usWork1221/python-code-docstring-scraper
A multi-threaded GitHub scraper to collect Python code with docstrings from... |
|
Experimental |
| 75 |
LiliValGo/NLP-for-IPCC-Climate-Reports
This project combines web scraping, PDF processing, and Natural Language... |
|
Experimental |
| 76 |
Biswas-N/Norman-PD-incidents-extractor
Python based utility to create Norman Police Department's incident dataset... |
|
Experimental |
| 77 |
Onaga08/scrape-and-sense
A comprehensive script for web scraping and NLP analysis, providing detailed... |
|
Experimental |
| 78 |
doinakis/Real-Time-News-Assistant
Real Time News Asstistant for Greek news. |
|
Experimental |
| 79 |
ArpitaChatterjee/Routine-Analysis-of-a-Comedian
Build a dataset using the transcript for the 10 popular comedians, using web... |
|
Experimental |