gemini-browser-agent and gemini-computer-use
These are ecosystem siblings—one provides a Chrome extension interface for interactive browser control while the other offers a programmatic Playwright-based automation approach, serving different use cases (manual vs. scriptable) for the same underlying Gemini 2.5 Computer Use capability.
About gemini-browser-agent
pmbstyle/gemini-browser-agent
A browser agent with a Google Chrome extension that can work in your browser. Based on Google Gemini 2.5 computer use model.
Bridges a Chrome extension with Google's Gemini Computer Use API to observe and interact with the active tab in real-time, exchanging screenshots and DOM events without requiring sandboxing. Uses a Python WebSocket server that communicates bidirectionally with the extension, enabling the model to execute browser automation tasks directly within your own browser context. Supports agentic workflows where Gemini plans multi-step interactions and streams execution logs back to the UI.
About gemini-computer-use
pmbstyle/gemini-computer-use
A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.
Implements vision-based browser automation by feeding screenshots to Gemini 2.5 for visual understanding, enabling the model to locate and interact with page elements without DOM parsing. Includes built-in safety guardrails with human-in-the-loop confirmation for sensitive operations, and provides a comprehensive action API covering clicks, typing, scrolling, drag-and-drop, and navigation primitives executed through Playwright's browser control layer.
Scores updated daily from GitHub, PyPI, and npm data. How scores work