joaobenedetmachado/scrapit
A (really) easy way to web scrape
Defines scraping targets declaratively in YAML—selectors, transforms, validation, and output formats—eliminating the need to write Python code for new sources. Supports five fetch backends (BeautifulSoup, Playwright for JavaScript, httpx async, GraphQL, Bright Data) with 28+ field transforms, pagination, spider discovery, and parallel crawling. Outputs to eight formats (JSON, CSV, SQLite, MongoDB, PostgreSQL, Excel, Google Sheets, Parquet) with optional webhooks, change detection, Redis caching, and a built-in web dashboard.
Stars
56
Forks
21
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/joaobenedetmachado/scrapit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related agents
vakra-dev/reader
Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire...
firecrawl/open-scouts
🔥 AI-powered web monitoring platform. Create automated scouts that search the web and send email...
BrowserCash/teracrawl
High-performance web crawler API optimized for LLMs. Turn any search or website into clean...
memvid/maw
Crawl any website into a single searchable file. Query it forever, offline.
ma-pony/deepspider
智能爬虫工程平台 - 基于 DeepAgents + Patchright 的 AI 爬虫 Agent | Intelligent Web Scraping Platform -...