jamesturk/scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.

/ 100

Established

Leverages GPT's language understanding to extract structured data from HTML by defining schemas in Python, with built-in preprocessing (HTML cleaning, CSS/XPath filtering, auto-splitting for large pages) and postprocessing (Pydantic validation, hallucination detection). Includes cost tracking and budget controls to manage expensive API calls, plus automatic model fallbacks between GPT-3.5-Turbo and GPT-4. Note: This 2023 project is no longer maintained.

1,444 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

1,444

Forks

Language

Python

License

—

Related tools

Priyanshu-hawk/ChatGPT-unofficial-api-selenium

This is unofficial ChatGPT API using selenium for prompt testing and flow testing purposes

3281448091/easyChatGPT

An unofficial yet elegant interface of the ChatGPT API using browser automation that bypasses...

ryuseisan/auto-chatgpt

Automate interaction with the browser version of ChatGPT.

djb-gt/gpt-automated-web-scraper

The GPT-based Universal Web Scraper MVP is a solution that leverages GPT models and web scraping...

nitin-kumar101/ChatGPT-AutoChat

This repository contains code that automates chat interactions with ChatGPT using Selenium and...

Explore LLM Tools

All categories Trending LLM Tool directory Insights