flairNLP/fundus

A very simple news crawler with a funny name

80
/ 100
Verified

Supports crawling from both live publisher websites and the CommonCrawl CC-NEWS archive with multi-process parallel fetching, enabling large-scale corpus creation. Provides unified article parsing across 150+ international news publishers with structured extraction of text, metadata, images, and multiple content source types (live sites, sitemaps, web archives). Includes AI training filtering to help identify publishers that haven't objected to model training on their content.

443 stars and 3,566 monthly downloads. Available on PyPI.

Maintenance 13 / 25
Adoption 18 / 25
Maturity 25 / 25
Community 24 / 25

How are scores calculated?

Stars

443

Forks

105

Language

Python

License

MIT

Last pushed

Mar 17, 2026

Monthly downloads

3,566

Commits (30d)

0

Dependencies

18

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/flairNLP/fundus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.