any4ai/AnyCrawl

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

/ 100

Established

Supports multiple scraping engines (Cheerio for static parsing, Playwright/Puppeteer for JavaScript rendering) and integrates Redis for caching and batch task management. Features LLM-powered JSON extraction via JSON Schema, enabling structured data generation directly from crawled pages without separate post-processing. Offers self-hosted deployment with Docker Compose and Bearer token authentication, alongside a REST API for web scraping, full-site crawling with path filtering, and multi-engine SERP extraction.

2,763 stars. Actively maintained with 32 commits in the last 30 days.

No Package No Dependents

Maintenance 23 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

2,763

Forks

289

Language

TypeScript

License

MIT

Related tools

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...

kreuzberg-dev/html-to-markdown

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...

lightfeed/extractor

Using LLMs and AI browser automation to robustly extract web data

paulpierre/markdown-crawler

A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file...

Explore RAG Tools

All categories Trending RAG directory Insights