r-muresan/screen.vision

Get a guided tour for anything, right on your screen.

/ 100

Emerging

Combines vision language models (GPT, Gemini, Qwen-VL) to analyze screen state and deliver single-step instructions, with automatic progress detection via frame comparison. The frontend captures video via MediaDevices API while the backend orchestrates multi-model reasoning—instruction generation, step verification, and UI coordinate detection—with zero server-side data retention. Built on Next.js/React frontend and FastAPI backend, supporting self-hosting with configurable AI provider APIs.

300 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 18 / 25

How are scores calculated?

Stars

300

Forks

Language

TypeScript

License

MIT

Higher-rated alternatives

pocketpaw/pocketpaw

Your AI agent in 30 seconds. Not 30 hours. Self-hosted, open-source personal AI with desktop...

zhayujie/chatgpt-on-wechat

CowAgent是基于大模型的超级AI助理，能主动思考和任务规划、访问操作系统和外部资源、创造和执行Skills、拥有长期记忆并不断成长。同时支持飞书、钉钉、企业微信应用、微信公众号、网页等接入...

iniwap/AIWriteX

AIWriteX - 微信公众号全自动AI工具：全网热搜舆情聚合+趋势分析+爆款选题+文章采集+一键生成排版发布 | 去AI味、过朱雀检测 | 支持小红书/百家号/抖音等多平台 |...

OpenAdaptAI/OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation...

MiniMax-AI/OpenRoom

A browser-based desktop where AI Agent operates every app through natural language.

Explore AI Agents

All categories Trending AI Agent directory Insights