TencentQQGYLab/AppAgent
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Operates Android apps through vision-based perception and simplified action primitives (tap, swipe) without requiring backend access, making it applicable across diverse third-party applications. The framework employs a two-phase approach: an exploration phase where the agent builds a knowledge base either through autonomous navigation or human demonstrations, followed by a deployment phase that leverages this documentation to execute complex tasks. Supports multiple multimodal LLM backends including GPT-4V and Qwen-VL, with integration via Android Debug Bridge (ADB) for real devices or Android Studio emulators.
6,582 stars. No commits in the last 6 months.
Stars
6,582
Forks
736
Language
Python
License
MIT
Category
Last pushed
Mar 19, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/TencentQQGYLab/AppAgent"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
bubbuild/republic
Build LLM workflows like normal Python while keeping a full audit trail by default.
mitdbg/palimpzest
A System for Optimized Semantic Computation
ctrl-gaurav/effGen
effGen: Enabling Small Language Models as Capable Autonomous Agents
dlMARiA/Syzygy-of-thoughts
Syzygy-of-thoughts
lwcsrf/netflux
Minimalist framework for authoring custom agentic applications in python; emphasizes task...