Agent Reliability Engineering AI Agents
Standards, frameworks, and operational tooling for measuring, testing, and improving the reliability of AI agents before and after production deployment. Includes failure-mode evaluation, SRE principles applied to agents, quality metrics, and deterministic safety guarantees. Does NOT include general agent monitoring dashboards, agent security hardening, or agent infrastructure resilience (those focus on different aspects of operations).
There are 32 agent reliability engineering agents tracked. 2 score above 50 (established tier). The highest-rated is petterjuan/agentic-reliability-framework at 54/100 with 19 stars and 115 monthly downloads.
Get all 32 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=agent-reliability-engineering&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Agent | Score | Tier |
|---|---|---|---|
| 1 |
petterjuan/agentic-reliability-framework
ARF is an agentic reliability intelligence platform that separates decision... |
|
Established |
| 2 |
sarkar-ai-taken/riva
Local-first observability and control plane for AI agents. |
|
Established |
| 3 |
Nubaeon/empirica
Make AI agents and AI workflows measurably reliable. Epistemic... |
|
Emerging |
| 4 |
soumendrak/ragwatch
An SDK for Python AI Agents. Under heavy development. |
|
Emerging |
| 5 |
relai-ai/relai-sdk
A platform for building reliable AI agents |
|
Emerging |
| 6 |
kalibr-ai/kalibr-sdk-python
Your agents silently degrade in production. Kalibr keeps them on the optimal... |
|
Emerging |
| 7 |
exospherehost/ai-reliability-standards
Architectural standards and best practices for building reliable AI Agents... |
|
Emerging |
| 8 |
imtt-dev/steer
The Active Reliability Layer for AI Agents. Catch failures, teach fixes, and... |
|
Emerging |
| 9 |
itbench-hub/ITBench-CISO-CAA-Agent
Code repository for CISO agent as part of ITBench |
|
Emerging |
| 10 |
eth-sri/ToolFuzz
ToolFuzz is a fuzzing framework designed to test your LLM Agent tools. |
|
Experimental |
| 11 |
ai-2070/l0-python
L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable.... |
|
Experimental |
| 12 |
choutos/agent-reliability-engineering
Agent Reliability Engineering: applying SRE principles to AI agent systems.... |
|
Experimental |
| 13 |
ai-2070/l0
L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable.... |
|
Experimental |
| 14 |
khan5v/kalibra
Statistical regression detection and CI quality gates for AI agents |
|
Experimental |
| 15 |
enkronos/agentevalops
Failure-mode evaluation harness for agent systems. |
|
Experimental |
| 16 |
johnnylugm-tech/agent-dashboard
A light-weight skill with quick start to monitor the latest status of each... |
|
Experimental |
| 17 |
SyntheticSynaptic/agentura
CI for AI agents, no SDK. Define eval suites in agentura.yaml, run them on... |
|
Experimental |
| 18 |
kadubon/Oversight-Centered-Metrology-PoC
Lightweight proof-of-concept for oversight-centered metrology in coding... |
|
Experimental |
| 19 |
thinkbigcd/agent-monitor
monitoring dashboard and observability tools for ai agents |
|
Experimental |
| 20 |
StanislavBG/stepproof
Regression testing CLI for AI agents — define expected behaviors in YAML,... |
|
Experimental |
| 21 |
MyK-Exee/ai-assert
Verify AI-generated outputs against constraints with retries to ensure... |
|
Experimental |
| 22 |
arabindanarayandas/invari
The repair layer for AI agents. Validates and fixes malformed API calls in... |
|
Experimental |
| 23 |
alyssadata/Driftmap-Public-Harness_llm-eval-harness-lite
Public Driftmap harness: public-safe CSV suites + rubrics + run logs for... |
|
Experimental |
| 24 |
LuisGG72/reliability-pack-api
Operational reliability API for AI agents: normalize inputs, contract-test... |
|
Experimental |
| 25 |
zahere/reliability-polynomials
Generalized reliability polynomials for quality-weighted network analysis.... |
|
Experimental |
| 26 |
nobutakayamauchi/RTS
ai-agents llm-ai gpt-workflows ai-audit execution-logging ai-research... |
|
Experimental |
| 27 |
Sutr-dev999/agent-monitoring-system
Agent Monitoring System |
|
Experimental |
| 28 |
conde-fc/agentic-ai-accountability
Post-deployment behavioral measurement framework for AI agents — traces... |
|
Experimental |
| 29 |
feralghost/model-watchdog
Auto-rollback for AI agent config changes. Zero dependencies, single Python file. |
|
Experimental |
| 30 |
tylerdh12/agent-reliability-toolkit
Open-source testing framework for AI agents. Test for the 7 failure modes... |
|
Experimental |
| 31 |
NithiN-1808/agentchaos
Chaos testing for agentic AI — fault injection hooks for openai-agents-python |
|
Experimental |
| 32 |
mohamedchouat/ai-verifier
AI Verifier is an Android app that lets you ask questions to multiple AI... |
|
Experimental |