apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Provides unified crawler abstractions (`PlaywrightCrawler`, `HttpCrawler`) that automatically generate human-like TLS fingerprints and browser headers, with persistent request queues supporting breadth/depth-first traversal and configurable routing via hooks. Built modular as scoped packages (`@crawlee/core`, `@crawlee/utils`) with pluggable storage backends and integrated Cheerio/JSDOM parsers for both browser-rendered and lightweight HTTP-only scraping workflows.
22,542 stars and 346,203 monthly downloads. Used by 1 other package. Actively maintained with 36 commits in the last 30 days. Available on npm.
Stars
22,542
Forks
1,288
Language
TypeScript
License
Apache-2.0
Category
Last pushed
Mar 28, 2026
Monthly downloads
346,203
Commits (30d)
36
Dependencies
14
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/apify/crawlee"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
microlinkhq/browserless
The headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs,...