ag2ai/Agents_Failure_Attribution

Benchmark for automated failure attributions in agentic systems (🏆 ICML 2025 Spotlight)

48
/ 100
Emerging

Introduces the "Who&When" benchmark with 184 annotated failure trajectories from both algorithm-generated (CaptainAgent) and hand-crafted (Magnetic-One) multi-agent systems, providing fine-grained labels for responsible agents, critical error steps, and failure explanations. Implements three attribution methods—All-at-Once, Step-by-Step, and Binary Search—that work with multiple LLM backends (GPT-4o, Llama, Qwen) to automatically pinpoint failure causes in complex agentic workflows. Evaluates performance on realistic scenarios derived from GAIA and AssistantBench datasets, enabling rapid debugging iteration and reward signals for agent self-correction.

349 stars.

No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 13 / 25

How are scores calculated?

Stars

349

Forks

23

Language

Python

License

MIT

Last pushed

Feb 11, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/ag2ai/Agents_Failure_Attribution"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.