QuesmaOrg/otel-bench

OpenTelemetry Benchmark - can AI trace your failed login?

/ 100

Emerging

Evaluates AI agents' ability to instrument applications with OpenTelemetry across 11 programming languages using the Harbor framework, which orchestrates interactions between CLI AI models (Claude, GPT, Gemini, etc.) and coding tasks. Tasks span distributed tracing, context propagation, microservices instrumentation, and gRPC scenarios, with results tracking agent trajectories and success rates per model-language combination. Supports multi-model comparative evaluation with configurable retry attempts and language-specific difficulty levels from simple to advanced scenarios.

No Package No Dependents

Maintenance 10 / 25

Adoption 6 / 25

Maturity 9 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Shell

License

Apache-2.0

Featured in

Agent Governance in 2026: Who's Building the Guardrails? Your Agent is Hitting its Ceiling — Who's Actually Fixing It

Higher-rated alternatives

truera/trulens

Evaluation and Tracking for LLM Experiments and AI Agents

traceroot-ai/traceroot

Find the Root Cause in Your Code's Trace

future-agi/traceAI

Open Source AI Tracing Framework built on Opentelemetry for AI Applications and Frameworks

VishApp/multiagent-debugger

Multi-Agent Debugger: An AI-powered debugging system using CrewAI to orchestrate specialized...

evilmartians/agent-prism

React components for visualizing traces from AI agents

Explore AI Agents

All categories Trending AI Agent directory Insights