vdutts7/gpt4V-scraper

AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.

/ 100

Emerging

Combines GPT-4V vision capabilities with Puppeteer-driven browser automation to capture full-page screenshots and extract structured data via vision-language understanding. Uses a three-part pipeline: screenshot capture with anti-bot evasion, image-to-text extraction via GPT-4V, and interactive web navigation with real-time natural language querying. Integrates OpenAI's vision API for semantic extraction and enables automated search workflows through conversational prompts against live web content.

294 stars.

No License No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

294

Forks

Language

JavaScript

License

—

Featured in

Giving AI Agents Eyes: Browser Automation in 2026

Higher-rated alternatives

alibaba/page-agent

JavaScript in-page GUI agent. Control web interfaces with natural language.

4ier/neo

Turn any web app into an API. Chrome extension captures browser traffic, auto-generates schemas,...

CloakHQ/CloakBrowser

Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with...

hanzili/hanzi-browse

let any ai agent use the local browser

nicobailon/surf-cli

The CLI for AI agents to control Chrome. Zero config, agent-agnostic, battle-tested.

Explore AI Agents

All categories Trending AI Agent directory Insights