Web Scraping NLP Pipelines NLP Tools

End-to-end systems that combine web scraping with NLP analysis (sentiment, readability, topic modeling, entity extraction) on text extracted from websites, articles, or online sources. Does NOT include standalone scraping tools, NLP libraries, or applications that only perform analysis without web data extraction.

There are 79 web scraping nlp pipelines tools tracked. 1 score above 70 (verified tier). The highest-rated is flairNLP/fundus at 73/100 with 443 stars and 3,566 monthly downloads.

Get all 79 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=web-scraping-nlp-pipelines&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 flairNLP/fundus

A very simple news crawler with a funny name

73
Verified
2 fhamborg/news-please

news-please - an integrated web crawler and information extractor for news...

64
Established
3 affjljoo3581/canrevan

대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다.

50
Established
4 FreeDiscovery/FreeDiscovery

Web Service for E-Discovery Analytics

50
Established
5 Multiverse-of-Projects/NewsAI

A dynamic NewsAI dashboard that uses NLP to analyze news articles, visualize...

46
Emerging
6 tirthajyoti/Web-Database-Analytics

Web scrapping and related analytics using Python tools

44
Emerging
7 smyja/blackmaria

Python package for webscraping in Natural language

39
Emerging
8 MasuRii/FBScrapeIdeas

Modern CLI tool for scraping & analyzing Facebook groups using Playwright &...

39
Emerging
9 rajaswa/DRIFT

DRIFT is a tool for Diachronic Analysis of Scientific Literature.

39
Emerging
10 FinnishCancerRegistry/gleason_extraction_py

Extract Gleason scores from texts.

36
Emerging
11 kevalmorabia97/SEDTWik-Event-Detection-from-Tweets

Segmentation based event detection from Tweets. Published at NAACL SRW 2019

36
Emerging
12 uhh-lt/newsleak

Information extraction and interactive visualization of textual datasets for...

36
Emerging
13 vipul-sharma20/sharingan

Tool to extract news articles from newspaper and give the context about the news

35
Emerging
14 sandeep-sandhu/NewsLookout

The NewsLookout web scraping application with NLP and data pre-processing

34
Emerging
15 ahmedbesbes/How-to-mine-newsfeed-data-and-extract-interactive-insights-in-Python

A practical guide to topic mining and interactive visualizations

32
Emerging
16 uscensusbureau/SABLE

Scraping Assisted by Learning

32
Emerging
17 Sotera/watchman

Watchman: An open-source social-media event-detection system

31
Emerging
18 Jasiri-App/datagpu

DataGPU is an open-source data compiler for AI pipelines that helps you...

30
Emerging
19 nawaz-kmr/Data_Extraction_and_Text_Analysis_for_Blackcoffer_company.

The objective of this assignment is to extract textual data articles from...

29
Experimental
20 VIDA-NYU/domain_discovery_API

Domain Discovery Operations API formalizes the human domain discovery...

28
Experimental
21 scrapegoat/scrapegoat

Scrape Data in One-shot.

28
Experimental
22 nakuleshj/news-nlp-pipeline

A fully serverless, event-driven data pipeline that ingests, enriches,...

28
Experimental
23 Just-Helpful/preventable-deaths-scraper

Web scraper, written for the Preventable Deaths website, with emphasis on...

27
Experimental
24 networkdynamics/seldonite

A News Article Collection Library

26
Experimental
25 victoria217-bottino/google-news-scraper

# 📰 Google News Scraper A Python tool to fetch, decode, and process...

26
Experimental
26 GateNLP/wpextract

Create datasets from WordPress sites for research or archiving

26
Experimental
27 lkstrp/newspaper-scraper

The all-in-one Python package for seamless newspaper article indexing,...

26
Experimental
28 SakuraPuare/ZhiHu_Spider

知乎内容爬虫 | Web scraper for Zhihu content extraction

24
Experimental
29 gangula-karthik/KAKI-App

A web app uniting everyone for big wins and a greener Singapore! 🚀🌳

24
Experimental
30 nostoz/news_monitor

Real time news monitor aggregating from various sources based on keywords

24
Experimental
31 ZIADEA/SmartWebScraper-CV

SmartWebScraper-CV – AI-Powered Web Page Zone Detection SmartWebScraper-CV...

24
Experimental
32 BioinfoNet/Data-mining

Data mining to discover trends in Open Science in Kenya

23
Experimental
33 Atharv279/Task-Extraction-NLP

NLP-based Task Extraction & Categorization | This project extracts tasks...

23
Experimental
34 jasp9559/Web-Scraping-of-Indian-Judgements

Web scraping project for scraping the latest/most recent judgement taken on the day

23
Experimental
35 antoninfaure/rssTrends

Finding Topics in French News using RSS Feeds

23
Experimental
36 ntddk/peeling-onions

A repository to store Deep Web (onion domain) crawler, scraper, and NLP...

23
Experimental
37 b-i-king/Top_News_Twitter_Bot_Template

Twitter Bot Template

22
Experimental
38 aybarskerem/WebScraper

This repo contains Various WebScrapers for different sites and process the...

22
Experimental
39 zer0Percent/OhWowBREAKINGNews

A multithreaded scraper to retrieve and parse new's articles.

22
Experimental
40 susannapaoli/web-scraper-nyt

New York Times Scraper

22
Experimental
41 georgiarichards/preventabledeathstracker

Code for running the Preventable Deaths Tracker website

22
Experimental
42 sodalabsio/event-detection-extraction

Repository for QA-based event detection and extraction from news and social media.

22
Experimental
43 bhx98/NameAnalysis

Choosing a company name by analyzing the most used keywords in the field and...

22
Experimental
44 dukeblue1994-glitch/chronicle

Intelligent event detection system using semantic embeddings, MinHash LSH...

21
Experimental
45 Aniket-16-S/Product-Scraper

Scrapping products from well known e-com. sites like Amazon, Flipkart and...

21
Experimental
46 jpwahle/cs-insights-crawler

This repository implements the interaction with DBLP, information extraction...

21
Experimental
47 dobbersc/fundus-evaluation

[ACL 2024] Evaluation of the Fundus News Scraper

21
Experimental
48 Awakumori/NGAspider

NGA论坛(艾泽拉斯国家地理)爬虫工具。采用多线程采集,MongoDB存储,集成PaddlePaddle进行NLP。整合百度解语进行实体识别,更新NLP情...

20
Experimental
49 balaurian/fx_news_scraper

A scraper for investing.com forex news using beautifulsoup and nltk. It also...

19
Experimental
50 agi-templar/MediaCloudDataDownloader

Download full-length articles from media outlets.

19
Experimental
51 WISETICT-PPAM/Data-Analytics

제품 정보 크롤링 및 리뷰 텍스트 마이닝

19
Experimental
52 stkisengese/news-intelligence-nlp-platform

A Python-based NLP platform for scraping, analyzing, and enriching news...

18
Experimental
53 AmmarRashed/EventOrient

A web-based application for monitoring, analyzing and visualizing social...

18
Experimental
54 Kamomille/WebScrapping_Supermarket

Analyse des coûts des supermarchés

18
Experimental
55 someoneorlov/styx

ML News Analysis Service

18
Experimental
56 samuelhatcliff/newstracker

News Tracker is an application designed to enhance and optimize the way that...

17
Experimental
57 nivaangupta/news-website

A news website that provides summarised news on trending topics, popular...

15
Experimental
58 MANISH007700/NewsArticleExtraction

Extraction of News Article from different News Web Pages using feedparser...

15
Experimental
59 eyereece/nlp-text-mining-dashboard

nlp text mining dashboard to explore current trends and extract most used...

15
Experimental
60 asaifuddin18/Search-Engine-Data-Collector

Summer '21 research project under Forward Data Lab group. Django website...

14
Experimental
61 kshitijbhandari/Web-Scraping-and-text-analysis

NLP pipeline to scrape 114 articles using BeautifulSoup and compute 13...

14
Experimental
62 moehmeni/ezweb

Easy to use web page analyzer

13
Experimental
63 stuartemiddleton/floraguard_crawler

FloraGuard crawler for online forums and marketplaces around the illegal...

13
Experimental
64 satyampandey1411/SAT-News-Analyser

SAT News Analyser is a web application offering in-depth news article...

12
Experimental
65 manthank17-learn/Open-Source-Signal-Intelligence-Early-Anomaly-Detection-Platform

Early anomaly detection platform using news, prediction markets, shipping...

12
Experimental
66 AnonCatalyst/OpenZenith

The OpenZenith Project is a conceptual/ongoing initiative focused on...

12
Experimental
67 umutkavakli/sikayetvar-scraping

A scraping tool for customer complaints of specified brands to use in NLP tasks.

12
Experimental
68 javiermascarena/footy-narratives

Automated weekly storylines and topic summaries for the “Big Six” English...

12
Experimental
69 estefaniagPerez/net-analyzer-sna-nlp-analysis

This project (ReactJS and Python) combines Social Network Analysis (SNA) and...

12
Experimental
70 Haimonmon/snippy

A Book scraping bot that ables to give you books data, but be cautious as...

11
Experimental
71 DRSarcenoR/fetchNews

Aplicación en Streamlit que dado el prompt (se espera un nombre), muestre...

11
Experimental
72 pranjal-pravesh/web-article-analyzer

A comprehensive text analysis system that performs web scraping, sentiment...

11
Experimental
73 utkarsh512/CreateDebateScraper

Scraping debates from the CreateDebate forum

11
Experimental
74 Anonym0usWork1221/python-code-docstring-scraper

A multi-threaded GitHub scraper to collect Python code with docstrings from...

11
Experimental
75 LiliValGo/NLP-for-IPCC-Climate-Reports

This project combines web scraping, PDF processing, and Natural Language...

10
Experimental
76 Biswas-N/Norman-PD-incidents-extractor

Python based utility to create Norman Police Department's incident dataset...

10
Experimental
77 Onaga08/scrape-and-sense

A comprehensive script for web scraping and NLP analysis, providing detailed...

10
Experimental
78 doinakis/Real-Time-News-Assistant

Real Time News Asstistant for Greek news.

10
Experimental
79 ArpitaChatterjee/Routine-Analysis-of-a-Comedian

Build a dataset using the transcript for the 10 popular comedians, using web...

10
Experimental