Intelligent Web Data Extraction AI Agents
Tools that use AI agents to automatically extract, parse, and structure data from websites through natural language instructions and intent-based scraping. Does NOT include general web crawlers, SEO audit platforms, lead database services, or non-agentic scraping libraries.
There are 45 intelligent web data extraction agents tracked. 1 score above 50 (established tier). The highest-rated is vakra-dev/reader at 56/100 with 474 stars and 196 monthly downloads.
Get all 45 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=intelligent-web-data-extraction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Agent | Score | Tier |
|---|---|---|---|
| 1 |
vakra-dev/reader
Open-source, production-grade web scraping engine built for LLMs. Scrape and... |
|
Established |
| 2 |
joaobenedetmachado/scrapit
A (really) easy way to web scrape |
|
Emerging |
| 3 |
firecrawl/open-scouts
🔥 AI-powered web monitoring platform. Create automated scouts that search... |
|
Emerging |
| 4 |
memvid/maw
Crawl any website into a single searchable file. Query it forever, offline. |
|
Emerging |
| 5 |
BrowserCash/teracrawl
High-performance web crawler API optimized for LLMs. Turn any search or... |
|
Emerging |
| 6 |
ma-pony/deepspider
智能爬虫工程平台 - 基于 DeepAgents + Patchright 的 AI 爬虫 Agent | Intelligent Web... |
|
Emerging |
| 7 |
jufeng-2022/mtywatch
一句话监控网页内容变化,AI | 爬虫 | 网页监控 | 网页更新提醒 | 网页内容订阅 |
|
Emerging |
| 8 |
poneoneo/Alibaba-CLI-Scraper
Create your own Alibaba dataset and interact with it in plain English. |
|
Emerging |
| 9 |
oxylabs/ai-crawler-py
Crawl a website starting from a URL, find relevant pages, and extract data –... |
|
Emerging |
| 10 |
hmshb/scraping-agent-ai
AI-powered web scraping agent built with LangGraph, LangSmith, Firecrawl,... |
|
Emerging |
| 11 |
tinaponting/ai-robots-scrapers
AI robots.txt, AI scrapers block ai scrapers |
|
Experimental |
| 12 |
spider-rs/web-crawling-guides
How to guides on web-crawling or scraping |
|
Experimental |
| 13 |
ScrapeGraphAI/just-scrape
CLI for AI-powered web scraping, data extraction, search, and crawling ... |
|
Experimental |
| 14 |
1nn0k3sh4/trendevourer
Trend Devourer 👗✨ AI-Powered Visual Style Analyst |
|
Experimental |
| 15 |
kaymen99/ai-web-scraper
AI web scraper built with Crawl4AI for extracting structured leads data from... |
|
Experimental |
| 16 |
isweerasingha/Auditeo-AI
An enterprise-grade, agentic website audit engine powered by GPT-5.4 and... |
|
Experimental |
| 17 |
Dieans/Universal-News-Scraper
🌍 Scrape and aggregate news effortlessly with Universal News Scraper, your... |
|
Experimental |
| 18 |
ScrapeGraphAI/ScrapeHubAI
🌟 AI-powered tool to analyze GitHub stargazers, identify companies, and... |
|
Experimental |
| 19 |
sirToby99/swipenode
Lightning-fast, zero-render web extraction CLI built for AI agents. Extracts... |
|
Experimental |
| 20 |
Chaitya44/AI-WebScraper
An intelligent, universal web scraper powered by Google Gemini AI. Features... |
|
Experimental |
| 21 |
Kaus-code/Neuroscout-oss
An autonomous AI agent powered by Gemini 2.5 Flash that scouts GitHub for... |
|
Experimental |
| 22 |
rbhatia1997/artist-scout
Open-source AI A&R toolkit for artist scouting, shortlist building, and... |
|
Experimental |
| 23 |
lout33/scout-oss
Local web research agent and mission-driven intelligence scanner that writes... |
|
Experimental |
| 24 |
phia-francis/nesta-signal-scout
An AI-powered foresight agent for Nesta's Discovery Hub. Signal Scout... |
|
Experimental |
| 25 |
NickEinstein1/Scrapper-Enricher
Scrapping Agent - CrewAI |
|
Experimental |
| 26 |
oxylabs/ai-scraper-py
AI Scraper is a powerful scraping tool and scrape agent built to automate... |
|
Experimental |
| 27 |
breezy89757/AgentScraper
AgentScraper: AI-Powered Web Scraper (v1.0) with Visual Extraction |
|
Experimental |
| 28 |
brightdata/trendscan
TrendScan is a multi-source company intelligence platform for automated... |
|
Experimental |
| 29 |
breezy89757/SmartScraper
🤖 AI-Powered Web Scraper Generator - Turns URLs into Python code with... |
|
Experimental |
| 30 |
Musubi-ai/Musubi
Musubi: A convenient crawling tool for collecting web text data in Python. |
|
Experimental |
| 31 |
smoothemerson/scout
AI-powered multi-agent system that analyzes your GitHub profile and CV to... |
|
Experimental |
| 32 |
musadiq7860/AI_growth_auditor
AI-powered business growth audit tool — scrapes website, generates custom... |
|
Experimental |
| 33 |
rosasbehoundja/tech-trends-monitor
Automated RSS flux monitoring system |
|
Experimental |
| 34 |
nomiS0614/mtywatch
📧 Monitor webpage content with AI and receive real-time updates on topics... |
|
Experimental |
| 35 |
Tomefy5/scout-agent
Autonomous AI Agent for B2B Lead Generation & Enrichment |
|
Experimental |
| 36 |
Ascentia-Sandbox/StartInsight
Daily automated startup intelligence: 6 scrapers (Reddit/HN/PH/Trends/X) → 8... |
|
Experimental |
| 37 |
stell619/scraper-agent
AI-powered research agent — scrapes YouTube, Etsy, crypto, stocks & trends... |
|
Experimental |
| 38 |
FlowExtractAPI/ai-lead-extractor
Extract any information from websites using intelligent AI - from contact... |
|
Experimental |
| 39 |
itallstartedwithaidea/google-ai-agent-audit-engine
AI-powered Google Ads audit engine — automated account analysis, scoring,... |
|
Experimental |
| 40 |
vinay-852/AI-Agent-for-Sheets
The primary objective of this project is to harness Google’s Generative AI... |
|
Experimental |
| 41 |
afrexai-cto/ai-ops-audit
Free AI operations audit checklist for mid-market companies. Score your... |
|
Experimental |
| 42 |
BraaMohammed/microwave-ai
Microwave AI is a chat-based AI agent for vibe data enrichment. Upload a... |
|
Experimental |
| 43 |
michalboryczko/crawler-generator-agent
Autonomous agent analyzes websites and generates production-ready crawling... |
|
Experimental |
| 44 |
Atqiyanabila01/AI-Lead-Scout
An AI-powered web research agent that crawls company data and generates... |
|
Experimental |
| 45 |
Hirsun/Website-Crawler
一个为AI Agent设计的HTML网页爬取服务,能够高效获取网页内容并进行清洗处理。 |
|
Experimental |