any4ai/AnyCrawl
AnyCrawl π: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.
Supports multiple scraping engines (Cheerio for static parsing, Playwright/Puppeteer for JavaScript rendering) and integrates Redis for caching and batch task management. Features LLM-powered JSON extraction via JSON Schema, enabling structured data generation directly from crawled pages without separate post-processing. Offers self-hosted deployment with Docker Compose and Bearer token authentication, alongside a REST API for web scraping, full-site crawling with path filtering, and multi-engine SERP extraction.
2,763 stars. Actively maintained with 32 commits in the last 30 days.
Stars
2,763
Forks
289
Language
TypeScript
License
MIT
Category
Last pushed
Mar 08, 2026
Commits (30d)
32
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/any4ai/AnyCrawl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...
kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...
lightfeed/extractor
Using LLMs and AI browser automation to robustly extract web data
paulpierre/markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file...