All AI Evaluation Tools

216 tools ranked by quality score · Page 2 of 3

Showing 101–200 of 216
# Tool Score Tier
101 beling/bsuccinct-rs

Rust libraries and programs focused on succinct data structures

52
Established
102 DataDog/orchestrion

Automatic compile-time instrumentation of Go code

52
Established
103 FriendsOfOpenTelemetry/opentelemetry-bundle

Traces, metrics, and logs instrumentation within your Symfony application

52
Established
104 qwerty541/dns-bench

Find the fastest DNS in your location to improve internet browsing experience.

52
Established
105 ldcsaa/hp-soa

A fully functional, easy-to-use, and highly scalable microservice framework

51
Established
106 tlog-dev/tlog

Observability events system.

51
Established
107 ecoAPM/BenchmarkMockNet

Using BenchmarkDotNet to compare .NET mocking library performance

51
Established
108 smarr/ReBenchDB

ReBenchDB records benchmark results and provides customizable reporting to...

51
Established
109 vincentfree/opentelemetry

Open Telemetry extensions

51
Established
110 Point72/raydar

A perspective powered, user editable ray dashboard via ray serve

51
Established
111 quochuydev/dokploy-grafana-compose

Docker Compose stack for Grafana observability: Tempo traces, Loki logs,...

50
Established
112 ROCm/madengine

madengine is a streamlined CLI tool for running and benchmarking AI models...

50
Established
113 nfrankel/opentelemetry-tracing

Demo for end-to-end tracing via OpenTelemetry

50
Established
114 CodSpeedHQ/action

Github Actions for running CodSpeed in your CI

50
Established
115 kieker-monitoring/moobench

Micro-benchmarks for quantification of the performance overhead caused by...

50
Established
116 ipyflow/ipyflow

A reactive Python kernel for Jupyter notebooks.

50
Established
117 KaykCaputo/oracletrace

Lightweight Python tool to detect performance regressions and compare...

49
Emerging
118 RRZE-HPC/MachineState

This CLI tool and Python3 module collects the current system state for documentation

48
Emerging
119 dinesh-git17/claudehome

An architectural persistence experiment for large language models. Claude’s...

48
Emerging
120 ivanfioravanti/llm_context_benchmarks

📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing...

48
Emerging
121 facebookresearch/CUTracer

A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel...

48
Emerging
122 nyrkio/nyrkio

Nyrkiö is an open source platform for detecting performance changes in a...

48
Emerging
123 oteldb/oteldb

OpenTelemetry signal storage

48
Emerging
124 tw4452852/zbpf

Writing eBPF in Zig

48
Emerging
125 JDiskMark/jdm-java

Cross-platform Java Disk Benchmark Utility for measuring drive IO performance.

48
Emerging
126 lucsorel/pydoctrace

Generate architecture diagrams by tracing Python code execution

48
Emerging
127 komoju/komoju-datadog

Rust Datadog instrumentation

48
Emerging
128 mesaglio/otel-front

Lightweight OpenTelemetry viewer for local development. Single binary, no...

47
Emerging
129 Helmholtz-AI-Energy/perun

Perun is a Python package that measures the energy consumption of your applications.

47
Emerging
130 containerscrew/nflux

Simple network monitoring agent tool. Powered by eBPF & Rust 🐝

47
Emerging
131 blooop/bencher

A package for benchmarking the characteristics of arbitrary functions

46
Emerging
132 GabrielTecuceanu/httpress

a fast HTTP benchmarking tool built in Rust

46
Emerging
133 DataDog/httpd-datadog

Enhance Apache HTTPD Observability with Datadog's Module

46
Emerging
134 proactive-agent/langgraphics

Visualize live LangGraph execution and see how your agent thinks as it runs.

45
Emerging
135 CodSpeedHQ/instrument-hooks

Internal core for the codspeed instruments

45
Emerging
136 kjldev/purview-telemetry-sourcegenerator

.NET Source Generator for interface-based telemetry. Supporting activities,...

45
Emerging
137 agurinov/gopl

Golang platform library

45
Emerging
138 grafana/otel-profiling-go

Open Telemetry integration for Grafana Pyroscope and tracing solutions such...

45
Emerging
139 Spectral-Knight-Ops/local-llm-evaluator

Quickly test local LLMs with custom prompts to determine which model is best for you.

45
Emerging
140 feelpp/benchmarking

Feel++ Benchmarking

45
Emerging
141 gstinoco/mGFD

Meshless Generalized Finite Differences (mGFD) solver and reference...

44
Emerging
142 shnarazk/SAT-bench

A benchmark suit for SAT solvers

44
Emerging
143 uptrace/uptrace-ruby

OpenTelemetry Ruby distribution for Uptrace

44
Emerging
144 coralogix/coralogix-management-sdk

API clients for configuring the Coralogix platform.

44
Emerging
145 omniviser/omniray

Stop guessing! You and your AI can now see live what's happening inside your...

43
Emerging
146 HPE/torch-hammer

Torch Hammer: Strike while the GPU is hot

43
Emerging
147 typelevel/otel4s-sdk

Implementation of the otel4s SDK modules in Scala from scratch

43
Emerging
148 falcondev-oss/workflow

Simple type-safe queue worker with durable execution based on BullMQ.

42
Emerging
149 beorn/loggily

TypeScript logger with debug-style namespaces, structured JSON, and...

42
Emerging
150 givecareapp/givecare-bench

AI safety benchmark for long-term caregiving relationships. Tests crisis...

42
Emerging
151 NyanKiyoshi/pytest-django-queries

Generate performance reports from your django database performance tests.

42
Emerging
152 pgx-contrib/pgxotel

OpenTelemetry tracing instrumentation for pgx v5 — spans for queries,...

41
Emerging
153 skerkour/go-benchmarks

Comprehensive and reproducible benchmarks for Go developers and architects.

41
Emerging
154 rsasaki0109/CloudAnalyzer

CLI-first QA toolkit for point clouds, trajectories, and 3D perception...

41
Emerging
155 MrAlias/flow

An OpenTelemetry SpanProcessor reporting tracing flow metrics

41
Emerging
156 udhos/opentelemetry-trace-sqs

opentelemetry-trace-sqs propagates Open Telemetry tracing with SQS messages...

41
Emerging
157 jamesgober/metrics-lib

The fastest metrics library for Rust. Lock-free 0.6ns gauges, 18ns counters,...

41
Emerging
158 smyrgeorge/log4k

A Comprehensive Logging and Tracing Solution for Kotlin Multiplatform.

40
Emerging
159 KempnerInstitute/nvidia-hpc-benchmarks

NVIDIA HPC Benchmarks

40
Emerging
160 meshkovQA/Eval-ai-library

Comprehensive AI Model Evaluation Framework with advanced techniques...

39
Emerging
161 getaxonflow/axonflow

AxonFlow: Runtime control layer for production AI

39
Emerging
162 IBM/OpenDsStar

OpenDsStar is an open-source implementation of the DS-Star agent that...

38
Emerging
163 kobsio/kobs

Kubernetes Observability Platform

37
Emerging
164 hdmsantander/microservices-ops-demo

Spring Boot demo for observability, traceability and error analysis in a...

37
Emerging
165 mbzuai-oryx/Agent-X

ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric...

37
Emerging
166 evaluation-context-protocol/ecp

ECP is a standardized interface for orchestrating, auditing, and enforcing...

37
Emerging
167 verifywise-ai/plugin-marketplace

VerifyWise AI Governance Plugin Marketplace

36
Emerging
168 braintrustdata/braintrust-pi-extension

Braintrust tracing plugin for pi

36
Emerging
169 nixel2007/opentelemetry

OpenTelemetry SDK для OneScript

36
Emerging
170 everythings-gonna-be-alright/phpScope

PHP profiler that sends CPU sampling data to Pyroscope server.

36
Emerging
171 opsrobot-ai/opsrobot

Observability platform for OpenClaw agents, providing real-time tracing,...

35
Emerging
172 kolloch/reqray

Log call tree summaries after each request for rust programs instrumented...

35
Emerging
173 tracewayapp/opentelemetry-symfony-bundle

Pure-PHP OpenTelemetry instrumentation for Symfony - automatic HTTP,...

35
Emerging
174 PacificBiosciences/aardvark

A tool for sniffing out the differences in vari-Ants

35
Emerging
175 yonatan-h/express-k6-profiler

Finds bottlenecks in an Express app during load testing

34
Emerging
176 cuihairu/croupier

Croupier is a universal GM (Game Master) backend system designed for game...

34
Emerging
177 aykhans/sarin

A high-performance HTTP load testing tool. Features dynamic request...

33
Emerging
178 dolmen-go/flagx

Extensions for the Go 'flag' package: flagx, flagfile, flagnet, flagtrace

32
Emerging
179 MrAlias/collex

Use OpenTelemetry Collector Factories to Export with OpenTelemetry Go

32
Emerging
180 rodneylab/axum-graphql

Rust GraphQL demo/test API written in Rust, using Axum for routing,...

31
Emerging
181 AmalChandru/termtrace

A terminal workflow recorder that turns debugging sessions into replayable,...

31
Emerging
182 last9/opentelemetry-examples

Production-ready OpenTelemetry instrumentation examples for Go, Python,...

31
Emerging
183 PAIR-Systems-Inc/little-dorrit-editor

Multimodal benchmark for evaluating handwritten editorial correction in printed text.

31
Emerging
184 filipsPL/optuml

Optuna-optimized ML methods, with scikit-learn like API

31
Emerging
185 BudEcosystem/bud-runtime

Bud AI Foundry - A comprehensive inference stack for compound AI deployment,...

31
Emerging
186 russfellows/sai3-bench

A multi-protocol storage performance testing tool, inspired by vdbench, fio...

30
Emerging
187 hboublal/dopGuard

Modular observability platform for .NET applications, integrating with tools...

30
Emerging
188 imadAttar/spring-boot-unified-observability-starter

All-in-one Spring Boot Starter for Observability: Metrics, Traces, Logs, and...

30
Emerging
189 nshkrdotcom/AITrace

The unified observability layer for the AI Control Plane

30
Emerging
190 qcmet/qcmet

Quantum Computing Metrics and Benchmarks

30
Emerging
191 tolitius/cupel

discover LLMs punching above their weight

29
Experimental
192 wangyz1999/sync-video-label

A web-based annotation tool for synchronized multi-video timeline labeling...

29
Experimental
193 iRevive/fs2-grpc-otel4s

otel4s instrumentation for fs2-grpc

28
Experimental
194 mnemom/mnemom-platform

Safe House for AI agents — transparent gateway with inbound + outbound...

28
Experimental
195 rvnhq/raven

A lightweight, self-hostable cloud infrastructure monitoring and telemetry platform.

28
Experimental
196 DaSH-Lab-CSIS/blossom

BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed...

27
Experimental
197 kyahikaru/llm-guardrail-red-teaming

Protocol constrained red teaming of frontier LLM guardrails in high risk...

27
Experimental
198 last9/rails-otel-context

Tells you which code fired that query. Zero config.

27
Experimental
199 thanhdaon/clean-arch-go

Clean Architecture, DDD, CQRS with testings in Go

27
Experimental
200 LLMSystems/BehaviorRL-Hallucination

Learning When to Answer: Behavior-Oriented Reinforcement Learning for...

26
Experimental