Agent Platforms Are Four Problems, Not One

You'll deploy a coding agent and think you're done. You won't be told you also need sandboxing, governance, and orchestration — or that only one of these layers is actually mature.

Graham Rowe · April 02, 2026 · Updated daily with live data
agents mcp ai-coding

You want to deploy a coding agent at your company. You evaluate a few options, pick one, set it up. It works. Then someone asks: "what happens when the agent runs rm -rf /?" And you realise you have no sandboxing. Then legal asks about audit trails. Then your CISO asks about authorization policies. Then you discover you need three agents to cooperate on a task and there's no orchestration layer.

Deploying agents in production is four problems, not one. And right now, only one of those layers is genuinely mature.

Layer 1: Sandboxing — the only mature layer

This is the good news. Agent code execution sandboxing is a solved problem with real, production-grade options. If you only get one layer right, get this one — everything else is recoverable, but an unsandboxed agent with shell access is not.

ProjectScoreStarsApproach
E2B 92/100 11,263 Cloud sandboxes via SDK, containerized, self-hostable
OpenSandbox 87/100 7,681 gVisor/Kata/Firecracker runtimes, Kubernetes-native
boxlite 60/100 1,524 Micro-VM sandboxes with OCI support, no daemon
zeroboot 57/100 1,592 Firecracker snapshots, ~0.8ms restore, KVM isolation
nono 63/100 980 Kernel-enforced Landlock/Seatbelt, credential proxy
agent-safehouse 57/100 1,184 macOS sandbox-exec profiles for Claude/Codex
sandlock 57/100 16 Landlock + seccomp-bpf, Python SDK, pipeline composition

E2B (92/100, 11,263 stars) is the market leader — cloud sandboxes you control via Python or JavaScript SDK, self-hostable on AWS/GCP. Alibaba's OpenSandbox (87/100) is the Kubernetes-native alternative with multiple isolation runtimes. Both are production-grade.

For local development, nono (63/100) is notable — it uses kernel-level enforcement (Landlock on Linux, Seatbelt on macOS) with a clever credential proxy that keeps your API keys entirely outside the sandbox. agent-safehouse provides macOS-specific profiles for Claude Code and Codex.

The emerging contenders — boxlite, zeroboot, sandlock — are Rust-based with different isolation strategies. Boxlite provides micro-VMs without a daemon. Zeroboot achieves sub-millisecond restore times via Firecracker snapshots. Sandlock combines Landlock, seccomp-bpf, and syscall filtering with a Python SDK.

The sandboxing layer is where you'll find the most choice, the highest quality scores, and the most active development. This is the layer the community has decided matters most — and they're right.

Layer 2: Governance — the critical gap

Sandboxing prevents agents from doing damage accidentally. Governance prevents them from doing things they shouldn't, intentionally or otherwise. What actions is this agent authorized to perform? What data can it access? Who approved this workflow? Where's the audit trail?

This layer is alarmingly immature.

ProjectScoreStarsWhat it does
agent-governance-toolkit 64/100 47 AI Agent Governance Toolkit — Policy enforcement, zero-trust identity,...
DashClaw 61/100 121 🛡️Decision infrastructure for AI agents. Intercept actions, enforce guard...
sovereign-shield 59/100 15 AI security framework: tamper-proof action auditing, prompt injection...
intentshield 50/100 17 Pre-execution intent verification for AI agents. Audits what your AI is...
ai-maestro 64/100 525 AI Agent Orchestrator with Skills System - Give AI Agents superpowers:...

Microsoft's agent-governance-toolkit (64/100) is the most credible entry — it's Microsoft, they have enterprise customers who need this. But at 47 stars, it's still early. sovereign-shield and intentshield from the same author tackle policy enforcement and intent validation respectively — the right ideas, but single-maintainer projects.

The honest assessment: if you need agent governance today, you're building it yourself. There's no "install this and you have guardrails." The sandboxing layer took a year to mature from experiments to production tools. Governance is where sandboxing was a year ago — early projects, right ideas, no clear winner.

Layer 3: Orchestration — mature but fragmented

If you're running a single agent, you don't need this. If you need multiple agents to cooperate — one agent researches, another writes code, a third reviews it — you need orchestration.

ProjectScoreStarsBest for
openai-agents-python 98/100 19,951 OpenAI ecosystem, simple agent patterns
crewAI 97/100 45,936 Multi-agent teams with role-based collaboration
eliza 94/100 17,778 Autonomous agents with personality and memory
composio 94/100 27,355 Tool integration layer, 250+ integrations
trigger.dev 89/100 13,997 Background jobs and durable workflows
agent-framework 93/100 7,882 Lightweight, Microsoft ecosystem

Orchestration is the most crowded layer. OpenAI's agents SDK (98/100) is the simplest entry point. CrewAI (97/100, 45,936 stars) is the most popular for multi-agent teams. Composio focuses on tool integrations — connecting agents to external services.

The quality scores here are high (84-98) because these are well-maintained, well-adopted projects with active communities. But "mature" doesn't mean "settled" — the ecosystem is still fragmenting as new frameworks launch weekly. Picking an orchestration framework today means accepting that the landscape will look different in six months.

Layer 4: The agent itself — what are you actually deploying?

After sandboxing, governance, and orchestration, you still need to choose the agent that does the actual work. For coding agents specifically:

ProjectScoreStarsWhat it does
aider 76/100 41,898 aider is AI pair programming in your terminal
aiox-core 82/100 2,247 Synkra AIOS: AI-Orchestrated System for Full Stack Development - Core Framework v4.0
ruflo 80/100 29,743 🌊 The leading agent orchestration platform for Claude. Deploy intelligent...
vibe-kanban 70/100 23,088 Get 10X more out of Claude Code, Codex or any coding agent

Aider (76/100, 41,898 stars) is the established leader in terminal-based AI coding. The "vibe coding" category — agents that take high-level descriptions and produce full implementations — is emerging rapidly with projects like ruflo and vibe-kanban.

But here's the thing nobody mentions: the agent itself is the least important architectural decision. Any competent coding agent will produce reasonable code. The sandboxing, governance, and orchestration around it determine whether your deployment is safe, auditable, and scalable. An excellent agent without sandboxing is a liability. A mediocre agent with proper sandboxing and governance is enterprise-ready.

The maturity gap

Here's the reality of agent platform infrastructure in 2026:

  • Sandboxing: Production-ready. Multiple options scoring 55-92/100. Real competition driving quality up. Pick E2B or OpenSandbox and move on.
  • Orchestration: Mature but fragmenting. Scores of 84-98 across major frameworks. The choice is real but the options are all viable.
  • The agent: Also mature. Aider, Claude Code, Codex — the tools work. The differentiation is narrowing.
  • Governance: Early. Scores of 15-61. No clear winner. If your compliance team asks about agent authorization policies, you don't have a good answer yet.

If you're building an agent platform for your company, start with sandboxing — it's the layer where getting it wrong is unrecoverable. Then choose an orchestration framework based on your stack. Then pick your agents. And for governance — watch the space, build what you need internally, and expect the tooling to mature over the next 12 months.

Go deeper

Related analysis