Agent Reliability Engineering AI Agents

Standards, frameworks, and operational tooling for measuring, testing, and improving the reliability of AI agents before and after production deployment. Includes failure-mode evaluation, SRE principles applied to agents, quality metrics, and deterministic safety guarantees. Does NOT include general agent monitoring dashboards, agent security hardening, or agent infrastructure resilience (those focus on different aspects of operations).

There are 32 agent reliability engineering agents tracked. 2 score above 50 (established tier). The highest-rated is petterjuan/agentic-reliability-framework at 54/100 with 19 stars and 115 monthly downloads.

Get all 32 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=agent-reliability-engineering&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Agent	Score	Tier	Stars	Language
1	petterjuan/agentic-reliability-framework ARF is an agentic reliability intelligence platform that separates decision...	54	Established	19	Python
2	sarkar-ai-taken/riva Local-first observability and control plane for AI agents.	53	Established	3	Python
3	Nubaeon/empirica Make AI agents and AI workflows measurably reliable. Epistemic...	47	Emerging	187	Python
4	soumendrak/ragwatch An SDK for Python AI Agents. Under heavy development.	42	Emerging	5	Python
5	relai-ai/relai-sdk A platform for building reliable AI agents	40	Emerging	93	Python
6	kalibr-ai/kalibr-sdk-python Your agents silently degrade in production. Kalibr keeps them on the optimal...	35	Emerging	24	Python
7	exospherehost/ai-reliability-standards Architectural standards and best practices for building reliable AI Agents...	34	Emerging	4	Dockerfile
8	imtt-dev/steer The Active Reliability Layer for AI Agents. Catch failures, teach fixes, and...	34	Emerging	130	Python
9	itbench-hub/ITBench-CISO-CAA-Agent Code repository for CISO agent as part of ITBench	33	Emerging	21	Python
10	eth-sri/ToolFuzz ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.	26	Experimental	37	Python
11	ai-2070/l0-python L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable....	25	Experimental	3	Python
12	choutos/agent-reliability-engineering Agent Reliability Engineering: applying SRE principles to AI agent systems....	25	Experimental	3	Shell
13	ai-2070/l0 L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable....	24	Experimental	2	TypeScript
14	khan5v/kalibra Statistical regression detection and CI quality gates for AI agents	23	Experimental	1	Python
15	enkronos/agentevalops Failure-mode evaluation harness for agent systems.	23	Experimental	1	TypeScript
16	johnnylugm-tech/agent-dashboard A light-weight skill with quick start to monitor the latest status of each...	22	Experimental	—	Python
17	SyntheticSynaptic/agentura CI for AI agents, no SDK. Define eval suites in agentura.yaml, run them on...	22	Experimental	—	TypeScript
18	kadubon/Oversight-Centered-Metrology-PoC Lightweight proof-of-concept for oversight-centered metrology in coding...	22	Experimental	—	Python
19	thinkbigcd/agent-monitor monitoring dashboard and observability tools for ai agents	22	Experimental	4	Python
20	StanislavBG/stepproof Regression testing CLI for AI agents — define expected behaviors in YAML,...	22	Experimental	—	TypeScript
21	MyK-Exee/ai-assert Verify AI-generated outputs against constraints with retries to ensure...	22	Experimental	—	Python
22	arabindanarayandas/invari The repair layer for AI agents. Validates and fixes malformed API calls in...	22	Experimental	—	TypeScript
23	alyssadata/Driftmap-Public-Harness_llm-eval-harness-lite Public Driftmap harness: public-safe CSV suites + rubrics + run logs for...	20	Experimental	1	Python
24	LuisGG72/reliability-pack-api Operational reliability API for AI agents: normalize inputs, contract-test...	19	Experimental	—	—
25	zahere/reliability-polynomials Generalized reliability polynomials for quality-weighted network analysis....	19	Experimental	—	Python
26	nobutakayamauchi/RTS ai-agents llm-ai gpt-workflows ai-audit execution-logging ai-research...	16	Experimental	2	Python
27	Sutr-dev999/agent-monitoring-system Agent Monitoring System	15	Experimental	1	—
28	conde-fc/agentic-ai-accountability Post-deployment behavioral measurement framework for AI agents — traces...	14	Experimental	—	Python
29	feralghost/model-watchdog Auto-rollback for AI agent config changes. Zero dependencies, single Python file.	14	Experimental	—	Python
30	tylerdh12/agent-reliability-toolkit Open-source testing framework for AI agents. Test for the 7 failure modes...	14	Experimental	—	Python
31	NithiN-1808/agentchaos Chaos testing for agentic AI — fault injection hooks for openai-agents-python	14	Experimental	—	Python
32	mohamedchouat/ai-verifier AI Verifier is an Android app that lets you ask questions to multiple AI...	11	Experimental	—	Kotlin

Comparisons in this category

l0-python and l0 (25 vs 24)