AISmithLab/HumanStudy-Bench
HumanStudy-Bench: Towards AI Agent Design for Participant Simulation
Combines an Execution Engine that reconstructs full experimental protocols from published studies with standardized evaluation metrics (Probability Alignment Score, Effect Consistency Score) to measure whether LLM agents reach identical scientific conclusions as human participants. Supports modular agent design through customizable persona and prompt presets, enabling systematic comparison of configuration choices independent of base model capabilities. Includes 12 foundational studies spanning cognition and social psychology with over 6,000 trials, plus automated tooling to add new studies from research PDFs.
Stars
12
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 08, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/AISmithLab/HumanStudy-Bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
StonyBrookNLP/appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...
qualifire-dev/rogue
AI Agent Evaluator & Red Team Platform
future-agi/ai-evaluation
Evaluation Framework for all your AI related Workflows
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...
agentscope-ai/OpenJudge
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards