vdutts7/gpt4V-scraper
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
Combines GPT-4V vision capabilities with Puppeteer-driven browser automation to capture full-page screenshots and extract structured data via vision-language understanding. Uses a three-part pipeline: screenshot capture with anti-bot evasion, image-to-text extraction via GPT-4V, and interactive web navigation with real-time natural language querying. Integrates OpenAI's vision API for semantic extraction and enables automated search workflows through conversational prompts against live web content.
294 stars.
Stars
294
Forks
28
Language
JavaScript
License
—
Category
Last pushed
Mar 01, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/vdutts7/gpt4V-scraper"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
alibaba/page-agent
JavaScript in-page GUI agent. Control web interfaces with natural language.
4ier/neo
Turn any web app into an API. Chrome extension captures browser traffic, auto-generates schemas,...
CloakHQ/CloakBrowser
Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with...
hanzili/hanzi-browse
let any ai agent use the local browser
nicobailon/surf-cli
The CLI for AI agents to control Chrome. Zero config, agent-agnostic, battle-tested.