mlibre/Clean-Web-Scraper

A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖

/ 100

Experimental

No License No Package No Dependents

Maintenance 6 / 25

Adoption 3 / 25

Maturity 1 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

JavaScript

License

—

Category

llm-web-scraping

Last pushed

Oct 25, 2025

Commits (30d)

GitHub

LLM Web Scraping · 33 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mlibre/Clean-Web-Scraper"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

carlosplanchon/spidercreator

Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of...

raznem/parsera

Lightweight library for scraping web-sites with LLMs

Riddhish1/CogniScrape

Intelligent Web Scraping Library with LLMs

poodle64/supacrawl

Zero-infrastructure web scraping for the terminal

rednafi/html-to-text

Extract pure text from any webpage

Explore LLM Tools

All categories Trending LLM Tool directory Insights