YoongiKim/AutoCrawler
Google, Naver multiprocess image web crawler (Selenium)
Supports full-resolution image downloads, configurable thread pools, and face detection mode, with data imbalance detection across keyword directories. Uses Selenium with XPath-based link extraction that can be customized per search engine, plus headless mode and proxy rotation for distributed crawling. Includes remote SSH execution via virtual display (Xvfb) and maintainable architecture allowing site-specific selector updates as Google and Naver layouts evolve.
1,692 stars. No commits in the last 6 months.
Stars
1,692
Forks
429
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 15, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/YoongiKim/AutoCrawler"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
lorey/mlscraper
🤖 Scrape data from HTML websites automatically by just providing examples
machine-learning-apps/Issue-Label-Bot
Code For The Issue Label Bot, an App that automatically labels issues using machine learning,...
nuhmanpk/Webtrench
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of...
Tuhin-thinks/instagram-unfollower-tracker-meerkit
Analyze Instagram followers, find unfollowers, automate follow/unfollow, and predict follow-backs.