repylot/GithubScrapper
GitHub Scrapper for RePylot is a web scraper for GitHub that generates datasets consisting of code files, later used to fine tune GPT-2. In its current state, it can efficiently extract Python scripts from repositories, making it a valuable tool for preparing training data for machine learning and NLP models.
No commits in the last 6 months.
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/repylot/GithubScrapper"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
soxoj/maigret
🕵️♂️ Collect a dossier on a person by username from 3000+ sites
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
omkarcloud/botasaurus
The All in One Framework to Build Undefeatable Scrapers