Agent Reliability Engineering AI Agents

Standards, frameworks, and operational tooling for measuring, testing, and improving the reliability of AI agents before and after production deployment. Includes failure-mode evaluation, SRE principles applied to agents, quality metrics, and deterministic safety guarantees. Does NOT include general agent monitoring dashboards, agent security hardening, or agent infrastructure resilience (those focus on different aspects of operations).

There are 32 agent reliability engineering agents tracked. 2 score above 50 (established tier). The highest-rated is petterjuan/agentic-reliability-framework at 54/100 with 19 stars and 115 monthly downloads.

Get all 32 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=agent-reliability-engineering&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Agent Score Tier
1 petterjuan/agentic-reliability-framework

ARF is an agentic reliability intelligence platform that separates decision...

54
Established
2 sarkar-ai-taken/riva

Local-first observability and control plane for AI agents.

53
Established
3 Nubaeon/empirica

Make AI agents and AI workflows measurably reliable. Epistemic...

47
Emerging
4 soumendrak/ragwatch

An SDK for Python AI Agents. Under heavy development.

42
Emerging
5 relai-ai/relai-sdk

A platform for building reliable AI agents

40
Emerging
6 kalibr-ai/kalibr-sdk-python

Your agents silently degrade in production. Kalibr keeps them on the optimal...

35
Emerging
7 exospherehost/ai-reliability-standards

Architectural standards and best practices for building reliable AI Agents...

34
Emerging
8 imtt-dev/steer

The Active Reliability Layer for AI Agents. Catch failures, teach fixes, and...

34
Emerging
9 itbench-hub/ITBench-CISO-CAA-Agent

Code repository for CISO agent as part of ITBench

33
Emerging
10 eth-sri/ToolFuzz

ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.

26
Experimental
11 ai-2070/l0-python

L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable....

25
Experimental
12 choutos/agent-reliability-engineering

Agent Reliability Engineering: applying SRE principles to AI agent systems....

25
Experimental
13 ai-2070/l0

L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable....

24
Experimental
14 khan5v/kalibra

Statistical regression detection and CI quality gates for AI agents

23
Experimental
15 enkronos/agentevalops

Failure-mode evaluation harness for agent systems.

23
Experimental
16 johnnylugm-tech/agent-dashboard

A light-weight skill with quick start to monitor the latest status of each...

22
Experimental
17 SyntheticSynaptic/agentura

CI for AI agents, no SDK. Define eval suites in agentura.yaml, run them on...

22
Experimental
18 kadubon/Oversight-Centered-Metrology-PoC

Lightweight proof-of-concept for oversight-centered metrology in coding...

22
Experimental
19 thinkbigcd/agent-monitor

monitoring dashboard and observability tools for ai agents

22
Experimental
20 StanislavBG/stepproof

Regression testing CLI for AI agents — define expected behaviors in YAML,...

22
Experimental
21 MyK-Exee/ai-assert

Verify AI-generated outputs against constraints with retries to ensure...

22
Experimental
22 arabindanarayandas/invari

The repair layer for AI agents. Validates and fixes malformed API calls in...

22
Experimental
23 alyssadata/Driftmap-Public-Harness_llm-eval-harness-lite

Public Driftmap harness: public-safe CSV suites + rubrics + run logs for...

20
Experimental
24 LuisGG72/reliability-pack-api

Operational reliability API for AI agents: normalize inputs, contract-test...

19
Experimental
25 zahere/reliability-polynomials

Generalized reliability polynomials for quality-weighted network analysis....

19
Experimental
26 nobutakayamauchi/RTS

ai-agents llm-ai gpt-workflows ai-audit execution-logging ai-research...

16
Experimental
27 Sutr-dev999/agent-monitoring-system

Agent Monitoring System

15
Experimental
28 conde-fc/agentic-ai-accountability

Post-deployment behavioral measurement framework for AI agents — traces...

14
Experimental
29 feralghost/model-watchdog

Auto-rollback for AI agent config changes. Zero dependencies, single Python file.

14
Experimental
30 tylerdh12/agent-reliability-toolkit

Open-source testing framework for AI agents. Test for the 7 failure modes...

14
Experimental
31 NithiN-1808/agentchaos

Chaos testing for agentic AI — fault injection hooks for openai-agents-python

14
Experimental
32 mohamedchouat/ai-verifier

AI Verifier is an Android app that lets you ask questions to multiple AI...

11
Experimental

Comparisons in this category