hidai25/eval-view
Regression testing for AI agents. Snapshot behavior, diff tool calls, catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.
Captures tool-call sequences and parameter changes across multi-turn agent interactions, then compares against golden baselines to surface silent regressions that pass health checks. Detects coordinated drift across test runs to distinguish provider rollouts from system failures, using a four-layer scoring system from free offline tool analysis to optional LLM-as-judge evaluation. Integrates deeply with CI/CD via GitHub Actions with automatic PR comments and cost/latency tracking, while supporting agent auto-detection across LangGraph, CrewAI, and other frameworks without requiring API keys for baseline operations.
Stars
63
Forks
16
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/hidai25/eval-view"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related agents
StonyBrookNLP/appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...
qualifire-dev/rogue
AI Agent Evaluator & Red Team Platform
future-agi/ai-evaluation
Evaluation Framework for all your AI related Workflows
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...
agentscope-ai/OpenJudge
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards