SRE Incident Automation AI Agents

AI agents for autonomous incident detection, root cause analysis, and remediation in production environments. Focuses on SRE-specific tools that integrate with observability platforms and cloud infrastructure. Does NOT include general monitoring dashboards, anomaly detection platforms without remediation, or incident classification frameworks.

There are 45 sre incident automation agents tracked. 2 score above 50 (established tier). The highest-rated is scitix/siclaw at 53/100 with 69 stars and 550 monthly downloads.

Get all 45 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=sre-incident-automation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Agent Score Tier
1 scitix/siclaw

AI-powered SRE platform — read-only infrastructure diagnostics with deep...

53
Established
2 Arvo-AI/aurora

Aurora — Open source AI-powered agentic incident management & root cause...

50
Established
3 pavangudiwada/awesome-ai-sre

AI SRE tools for RCA, Incident Response, Cost-Saving, Infra management,...

46
Emerging
4 a2wio/lucas

A2W's SRE agent for Kubernetes

46
Emerging
5 chatwoot/faultline

An open-source AI agent for infrastructure debugging.

42
Emerging
6 datolabs-io/opsy

Opsy - Your AI-Powered SRE Colleague

41
Emerging
7 whitepaper27/Sentri

AI-powered autonomous DBA agent — detects, diagnoses, and fixes Oracle...

40
Emerging
8 avivl/cloud-sre-agent

An autonomous SRE agent that monitors cloud logs across multiple platforms,...

37
Emerging
9 codeready-toolchain/tarsy

Intelligent Site Reliability Engineering agent for automatic alert processing

37
Emerging
10 ismailperim/oncallmate

🚨 Autonomous AI SRE agent that investigates Docker incidents while you...

34
Emerging
11 qicesun/SRE-Agent-App

An Autonomous AI SRE Agent for Kubernetes, built with Java Spring Boot &...

30
Emerging
12 qingwave/kubewizard

✨Kubewizard is An AI-Agent for automated Kubernetes troubleshooting, and...

25
Experimental
13 Joeen-AI-Labs/Netiarius

CLI agent for Linux server network troubleshooting and repair, with built-in...

25
Experimental
14 vitas/evidra

Flight recorder for Infrastructure Automation. Behavioral Reliability for...

25
Experimental
15 hanu-tayal/ai-oncall-agent

AI agents that replace human on-call engineers — automated error analysis,...

23
Experimental
16 codenamev/ruby_llm-ups

ups.dev status page integration for RubyLLM — automatic agent heartbeats,...

23
Experimental
17 AxonLabsDev/nervmap

Infrastructure cartography CLI — discover services, map dependencies, trace...

23
Experimental
18 haoranc/agent-estimate

The first open-source effort estimation tool built for AI coding agents....

22
Experimental
19 kiloloop/agent-estimate

The first open-source effort estimation tool built for AI coding agents....

22
Experimental
20 javakishore-veleti/Claims-Processor-With-SRE

A multi-tenant healthcare claims processing platform with AI-powered...

22
Experimental
21 imIbAd404/sre-agent

🚀 Automate self-healing and root cause analysis for financial services with...

22
Experimental
22 dbwls99706/deadends.dev

Structured failure knowledge infrastructure for AI agents — dead ends,...

22
Experimental
23 jayta1314/awesome-ai-sre

Curate and explore a comprehensive list of AI-driven tools and resources...

22
Experimental
24 GagauzSergii/anomaly_detection_platform

Distributed real-time AIOps platform for metric ingestion and anomaly...

22
Experimental
25 obtFusi/network-agent

CLI Agent für Netzwerk-Analyse via natürliche Sprache (Venice.ai)

22
Experimental
26 anonymousgirl123/ai-incident-analyzer

Build a production-style AI system that ingests logs and metrics, detects...

22
Experimental
27 koustubh-v/AutoDevOps-AI

Autonomous SRE agent that recursively audits, traces, and self-heals...

20
Experimental
28 csa7mdm/AutoMender

Autonomous AI Agent that detects, analyzes, and self-heals .NET runtime...

19
Experimental
29 iemafzalhassan/OutagePilot

OutagePilot uses a multi-agent system to autonomously detect, diagnose, and...

19
Experimental
30 agamm/awesome-ai-sre

A curated list of 100+ AI-powered tools, platforms, and resources for Site...

19
Experimental
31 sinzin91/awesome-sre-skills

A curated list of AI agent skills for Site Reliability Engineering —...

19
Experimental
32 agentincident/agentincident

The open incident format for autonomous AI agents. Record, classify, and...

19
Experimental
33 sydasif/network-automation-agent

Run commands on network device with LLM using netmiko

17
Experimental
34 Suraj-kumar00/DataIncidentManager

AI-Powered Autonomous Incident Management for Data Teams

16
Experimental
35 bblackheart013/semantic-devops-bot

AI-powered DevOps Assistant that reads error logs, suggests fixes, and...

16
Experimental
36 charles-adedotun/kubepulse

Intelligent Kubernetes health monitoring with AI-powered diagnostics,...

16
Experimental
37 kyisaiah47/cloudwatch-genius

AI-powered DevOps agent using Amazon Bedrock & Claude 3 Sonnet for...

16
Experimental
38 ghantakiran/ShieldOps

AI-Powered Autonomous SRE Platform — Autonomous agents for investigation,...

15
Experimental
39 AdityaIndoori/Sentry

Autonomous AI service monitor multi-agent pipeline (Triage, Detective,...

14
Experimental
40 rubsj/ai-devops-assistant

Multi-agent DevOps AI assistant for pipeline monitoring, log analysis, root...

14
Experimental
41 kaiojoceli51/ShieldOps

Automate incident investigation, remediation, and security enforcement...

14
Experimental
42 brngg/herald

AI agent that detects, diagnoses, and remediates Kubernetes incidents with...

14
Experimental
43 tareksyria/SREAgents

🤖 Build and manage AI-driven SRE agents to automate operations tasks with...

14
Experimental
44 kamaleshanantha/-metr-time-horizon-feb-2026

Interactive visualization of METR AI agent time horizon benchmark with...

11
Experimental
45 DilshanPGN/IncidentIQ

AI-driven observability & incident-analysis agent that plugs into Java...

11
Experimental