niuzaisheng/ScreenAgent
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
It functions by observing screenshots and outputting mouse/keyboard operations, following a "planning-execution-reflection" control loop to complete multi-step tasks. The project integrates with VNC servers for desktop control and offers inferencers for various VLMs like GPT-4V and LLaVA-1.5, or supports custom API interfaces for model interaction. A corresponding dataset, ScreenAgent dataset, facilitates training for diverse computer tasks.
579 stars. No commits in the last 6 months.
Stars
579
Forks
62
Language
Python
License
—
Category
Last pushed
Nov 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/niuzaisheng/ScreenAgent"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
openai/openai-agents-python
A lightweight, powerful framework for multi-agent workflows
openagents-org/openagents
OpenAgents - AI Agent Networks for Open Collaboration
camel-ai/camel
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents....
vamplabAI/sgr-agent-core
Schema-Guided Reasoning (SGR) has agentic system design created by neuraldeep community
BrainBlend-AI/atomic-agents
Building AI agents, atomically