QuesmaOrg/otel-bench
OpenTelemetry Benchmark - can AI trace your failed login?
Evaluates AI agents' ability to instrument applications with OpenTelemetry across 11 programming languages using the Harbor framework, which orchestrates interactions between CLI AI models (Claude, GPT, Gemini, etc.) and coding tasks. Tasks span distributed tracing, context propagation, microservices instrumentation, and gRPC scenarios, with results tracking agent trajectories and success rates per model-language combination. Supports multi-model comparative evaluation with configurable retry attempts and language-specific difficulty levels from simple to advanced scenarios.
Stars
16
Forks
1
Language
Shell
License
Apache-2.0
Category
Last pushed
Mar 01, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/QuesmaOrg/otel-bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
truera/trulens
Evaluation and Tracking for LLM Experiments and AI Agents
traceroot-ai/traceroot
Find the Root Cause in Your Code's Trace
future-agi/traceAI
Open Source AI Tracing Framework built on Opentelemetry for AI Applications and Frameworks
VishApp/multiagent-debugger
Multi-Agent Debugger: An AI-powered debugging system using CrewAI to orchestrate specialized...
evilmartians/agent-prism
React components for visualizing traces from AI agents