flairNLP/fundus

A very simple news crawler with a funny name

/ 100

Verified

Supports crawling from both live publisher websites and the CommonCrawl CC-NEWS archive with multi-process parallel fetching, enabling large-scale corpus creation. Provides unified article parsing across 150+ international news publishers with structured extraction of text, metadata, images, and multiple content source types (live sites, sitemaps, web archives). Includes AI training filtering to help identify publishers that haven't objected to model training on their content.

443 stars and 3,566 monthly downloads. Available on PyPI.

Maintenance 13 / 25

Adoption 18 / 25

Maturity 25 / 25

Community 24 / 25

How are scores calculated?

Stars

443

Forks

105

Language

Python

License

MIT

Category

web-scraping-nlp-pipelines

Last pushed

Mar 17, 2026

Monthly downloads

3,566

Commits (30d)

Dependencies

GitHub PyPI

Web Scraping NLP Pipelines · 79 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/flairNLP/fundus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related tools

fhamborg/news-please

news-please - an integrated web crawler and information extractor for news that just works

FreeDiscovery/FreeDiscovery

Web Service for E-Discovery Analytics

affjljoo3581/canrevan

대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다.

Multiverse-of-Projects/NewsAI

A dynamic NewsAI dashboard that uses NLP to analyze news articles, visualize sentiment trends,...

tirthajyoti/Web-Database-Analytics

Web scrapping and related analytics using Python tools

Explore NLP Tools

All categories Trending NLP directory Insights