hidai25/eval-view

Regression testing for AI agents. Snapshot behavior, diff tool calls, catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.

52
/ 100
Established

Captures tool-call sequences and parameter changes across multi-turn agent interactions, then compares against golden baselines to surface silent regressions that pass health checks. Detects coordinated drift across test runs to distinguish provider rollouts from system failures, using a four-layer scoring system from free offline tool analysis to optional LLM-as-judge evaluation. Integrates deeply with CI/CD via GitHub Actions with automatic PR comments and cost/latency tracking, while supporting agent auto-detection across LangGraph, CrewAI, and other frameworks without requiring API keys for baseline operations.

No Package No Dependents
Maintenance 13 / 25
Adoption 8 / 25
Maturity 13 / 25
Community 18 / 25

How are scores calculated?

Stars

63

Forks

16

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/hidai25/eval-view"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.