NadavIs56/WebCrawler2
A simple Python web crawler that processes URLs from web pages, handles redirects, and skips non-HTML content. It supports HTTP/HTTPS, calculates same-domain link ratios, avoids duplicate URLs, and saves results in a TSV file. Designed for easy scalability and future extensions.
No commits in the last 6 months.
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/NadavIs56/WebCrawler2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
soxoj/maigret
🕵️♂️ Collect a dossier on a person by username from 3000+ sites
0x676e67/wreq-python
An ergonomic Python HTTP Client with TLS fingerprint