jamesturk/scrapeghost
👻 Experimental library for scraping websites using OpenAI's GPT API.
Leverages GPT's language understanding to extract structured data from HTML by defining schemas in Python, with built-in preprocessing (HTML cleaning, CSS/XPath filtering, auto-splitting for large pages) and postprocessing (Pydantic validation, hallucination detection). Includes cost tracking and budget controls to manage expensive API calls, plus automatic model fallbacks between GPT-3.5-Turbo and GPT-4. Note: This 2023 project is no longer maintained.
1,444 stars.
Stars
1,444
Forks
88
Language
Python
License
—
Category
Last pushed
Jan 14, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jamesturk/scrapeghost"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
Priyanshu-hawk/ChatGPT-unofficial-api-selenium
This is unofficial ChatGPT API using selenium for prompt testing and flow testing purposes
3281448091/easyChatGPT
An unofficial yet elegant interface of the ChatGPT API using browser automation that bypasses...
ryuseisan/auto-chatgpt
Automate interaction with the browser version of ChatGPT.
djb-gt/gpt-automated-web-scraper
The GPT-based Universal Web Scraper MVP is a solution that leverages GPT models and web scraping...
nitin-kumar101/ChatGPT-AutoChat
This repository contains code that automates chat interactions with ChatGPT using Selenium and...