yjyddq/RiOSWorld

[NeurIPS 2025] Official repository of RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents

32
/ 100
Emerging

Provides a comprehensive benchmark for evaluating safety risks in multimodal computer-use agents through realistic desktop environment interactions, with evaluation trajectories released on HuggingFace. Uses virtualized desktop environments (VMware or Docker) as execution sandboxes and integrates with OSWorld's infrastructure for standardized task setup and metrics collection. Includes attack simulation utilities and automated risk evaluation pipelines to assess how agents respond to phishing, credential theft, and other adversarial scenarios.

117 stars.

No License No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 9 / 25

How are scores calculated?

Stars

117

Forks

6

Language

HTML

License

Last pushed

Dec 02, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/yjyddq/RiOSWorld"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.