page-agent and browserable
These are competitors: both enable AI agents to control web browsers through natural language, but page-agent executes JavaScript in-page for direct DOM manipulation while browserable provides a separate browser automation library, offering different architectural approaches to the same problem.
About page-agent
alibaba/page-agent
JavaScript in-page GUI agent. Control web interfaces with natural language.
Operates entirely client-side using text-based DOM analysis rather than screenshots, eliminating the need for multi-modal LLMs or external infrastructure like browser extensions or headless browsers. Integrates with any LLM via a standard API interface, and optionally extends to multi-page automation through a Chrome extension and MCP Server for external agent control.
About browserable
browserable/browserable
Open source and self-hostable browser automation library for AI agents
Provides vision-based task execution with LLM-driven navigation, form filling, and data extraction, achieving 90.4% on Web Voyager benchmarks. Integrates pluggable LLM providers (OpenAI, Claude, Gemini) and remote browser services (Hyperbrowser, Steel) via a Docker-based self-hosted architecture with MongoDB, Redis, and MinIO backend. Exposes functionality through a REST API and JavaScript SDK for programmatic agent control.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work