alepot55/agentrial
Statistical evaluation framework for AI agents
Provides multi-trial statistical evaluation with Wilson confidence intervals and step-level failure attribution using Fisher exact tests to identify where agent behavior diverges. Integrates natively with LangGraph, CrewAI, Pydantic AI, and other frameworks through adapters, automatically capturing trajectories and token costs across 45+ LLM models, while supporting CI/CD regression detection and production monitoring via drift detectors.
Available on PyPI.
Stars
15
Forks
2
Language
Python
License
MIT
Category
Last pushed
Feb 06, 2026
Monthly downloads
222
Commits (30d)
0
Dependencies
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/alepot55/agentrial"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Compare
Higher-rated alternatives
StonyBrookNLP/appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...
qualifire-dev/rogue
AI Agent Evaluator & Red Team Platform
future-agi/ai-evaluation
Evaluation Framework for all your AI related Workflows
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...
hidai25/eval-view
Regression testing for AI agents. Snapshot behavior, diff tool calls, catch regressions in CI....