LLM Web Scraping LLM Tools
Tools for extracting and parsing structured data from websites using LLM-powered methods, including web crawlers, HTML extractors, and scraping APIs optimized for AI agent integration. Does NOT include general-purpose web scrapers without LLM integration, browser automation tools, or proxy/VPN services.
There are 33 llm web scraping tools tracked. The highest-rated is carlosplanchon/spidercreator at 46/100 with 217 stars and 10 monthly downloads.
Get all 33 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-web-scraping&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
carlosplanchon/spidercreator
Automated web scraping spider generation using Browser Use and LLMs.... |
|
Emerging |
| 2 |
raznem/parsera
Lightweight library for scraping web-sites with LLMs |
|
Emerging |
| 3 |
Riddhish1/CogniScrape
Intelligent Web Scraping Library with LLMs |
|
Emerging |
| 4 |
poodle64/supacrawl
Zero-infrastructure web scraping for the terminal |
|
Emerging |
| 5 |
rednafi/html-to-text
Extract pure text from any webpage |
|
Emerging |
| 6 |
yeahhe365/JustSearch
基于 Playwright 的自主 AI 搜索智能体。支持迭代式任务规划、深度网页爬取,以及带引用来源的多源知识整合。 |
|
Emerging |
| 7 |
supadata-ai/js
Official TypeScript/JavaScript SDK for the Supadata API. |
|
Emerging |
| 8 |
ElysiumOSS/enterprise-ai-recursive-web-scraper
AI assisted web scraper, w/ content summarization, screensshots, and filter 🤖🕷️ |
|
Emerging |
| 9 |
SiluPanda/agent-crawl
High performance, lightweight and typesafe library to crawl and scrape web,... |
|
Emerging |
| 10 |
cipher-rc5/fire_ctrl
Spec-compliant self-hosted Firecrawl v2 runtime in native Rust |
|
Experimental |
| 11 |
AndreaBozzo/Ares
Next-gen AI scraper — LLM-powered structured data extraction |
|
Experimental |
| 12 |
cameronking4/nextjs-firecrawl-starter
Nextjs 15 Firecrawl app to scrape doc links for an LLM. Use it as a starter... |
|
Experimental |
| 13 |
us/crw
⚡Lightweight Firecrawl alternative in Rust — 91.5% coverage, 5x faster, 3MB... |
|
Experimental |
| 14 |
lee-lou2/distill
고성능 Rust 기반 웹 스크래퍼 & LLM 분석 API 서버 |
|
Experimental |
| 15 |
rowyio/LLM-Web-Crawler
Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode.... |
|
Experimental |
| 16 |
sammcj/firecrawler
A lightweight frontend for self-hosted Firecrawl instances |
|
Experimental |
| 17 |
plater7/docrawl
Web crawler para sitios de documentación — convierte páginas a Markdown... |
|
Experimental |
| 18 |
firecrawl/firecrawl-py
Crawl and convert any website into clean markdown |
|
Experimental |
| 19 |
flyrank-bih/flyscrape
The Most Powerful Open-source LLM Friendly Typescript Web Crawler & Scraper |
|
Experimental |
| 20 |
kubernetes-bad/metachar
Scraper for Chub.ai and JanitorAI.com |
|
Experimental |
| 21 |
Daedae147/flyscrape
🕷️ Streamline web scraping and crawling with FlyScrape, the Node.js package... |
|
Experimental |
| 22 |
TheFishPilot/Verity-Agentic-Web-Scraper
Verity API for verified web extraction in AI pipelines (Fastify +... |
|
Experimental |
| 23 |
iamagirlwithtechnicalmonstermind/firecrawl-swift-sdk
🔥 Scrape, crawl, search, extract, and map websites with the powerful... |
|
Experimental |
| 24 |
ruchit-p/essence
A fast, open-source web retrieval engine built in Rust. |
|
Experimental |
| 25 |
greysquirr3l/stygian
High-performance graph-based web scraping engine + anti-detection browser... |
|
Experimental |
| 26 |
Awin36/houzz-product-reviews-scraper
🏠 Extract Houzz product reviews into structured data for easy analysis,... |
|
Experimental |
| 27 |
Pankaj3112/pluckr
Schema-first, self-healing HTML extraction powered by LLMs |
|
Experimental |
| 28 |
ChenTaHung/HTML-Text-Parser
This project is designed to extract text from documents and prepare it for... |
|
Experimental |
| 29 |
aglasencnik/Parsera.NET
A lightweight NuGet package for the Parsera API, designed to simplify... |
|
Experimental |
| 30 |
parsera-labs/parsera-ts
A Typesafe SDK for Scraping LLMs with Parsera.org and JavaScript |
|
Experimental |
| 31 |
davidyen1124/ai-crawler
AI web scraper using GPT to dynamically optimize CSS selectors for reliable... |
|
Experimental |
| 32 |
1amageek/Scouter
A Swift library for recursive web content searching and link extraction... |
|
Experimental |
| 33 |
mlibre/Clean-Web-Scraper
A Node.js web scraper that extracts clean, readable content from websites -... |
|
Experimental |