HumanStudy-Hub/HumanStudy-Bench
HumanStudy-Bench: Community Edition — Standardized human study replays for AI agent evaluation
Combines an Execution Engine that reconstructs full experimental protocols from published studies with standardized metrics that evaluate agent alignment at the level of scientific inference, not just task completion. Decouples base model capabilities from agent design choices, enabling precise attribution of results in human-subject simulation tasks. Community-driven with structured contribution workflows—studies are defined via JSON schemas and Python scripts, verified locally, and submitted as pull requests targeting a growing reference library.
Stars
3
Forks
2
Language
Python
License
MIT
Category
Last pushed
Mar 21, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/HumanStudy-Hub/HumanStudy-Bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
StonyBrookNLP/appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...
qualifire-dev/rogue
AI Agent Evaluator & Red Team Platform
future-agi/ai-evaluation
Evaluation Framework for all your AI related Workflows
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...
agentscope-ai/OpenJudge
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards