r-muresan/screen.vision
Get a guided tour for anything, right on your screen.
Combines vision language models (GPT, Gemini, Qwen-VL) to analyze screen state and deliver single-step instructions, with automatic progress detection via frame comparison. The frontend captures video via MediaDevices API while the backend orchestrates multi-model reasoning—instruction generation, step verification, and UI coordinate detection—with zero server-side data retention. Built on Next.js/React frontend and FastAPI backend, supporting self-hosting with configurable AI provider APIs.
300 stars.
Stars
300
Forks
40
Language
TypeScript
License
MIT
Category
Last pushed
Feb 20, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/r-muresan/screen.vision"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
pocketpaw/pocketpaw
Your AI agent in 30 seconds. Not 30 hours. Self-hosted, open-source personal AI with desktop...
zhayujie/chatgpt-on-wechat
CowAgent是基于大模型的超级AI助理,能主动思考和任务规划、访问操作系统和外部资源、创造和执行Skills、拥有长期记忆并不断成长。同时支持飞书、钉钉、企业微信应用、微信公众号、网页等接入...
iniwap/AIWriteX
AIWriteX - 微信公众号全自动AI工具:全网热搜舆情聚合+趋势分析+爆款选题+文章采集+一键生成排版发布 | 去AI味、过朱雀检测 | 支持小红书/百家号/抖音等多平台 |...
OpenAdaptAI/OpenAdapt
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation...
MiniMax-AI/OpenRoom
A browser-based desktop where AI Agent operates every app through natural language.