ryoungj/ToolEmu
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
Uses LLMs (e.g., GPT-4) to emulate tool execution in a virtual sandbox without requiring actual API implementations, enabling rapid prototyping across diverse scenarios including high-stakes tools. Includes automated LM-based safety and helpfulness evaluators for scalable risk assessment, paired with a curated benchmark of 36 toolkits and 144 test cases for quantitative agent evaluation. Extensible architecture allows users to contribute new toolkits and test cases by specifying tool schemas and scenarios.
192 stars. No commits in the last 6 months.
Stars
192
Forks
20
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/ryoungj/ToolEmu"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
microsoft/promptbench
A unified evaluation framework for large language models
uptrain-ai/uptrain
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications....
microsoftarchive/promptbench
A unified evaluation framework for large language models
gabe-mousa/Apolien
AI Safety Evaluation Library
levitation-opensource/Manipulative-Expression-Recognition
MER is a software that identifies and highlights manipulative communication in text from human...