All AI Evaluation Tools

216 tools ranked by quality score · Page 2 of 3

Showing 101–200 of 216

« Prev Next »

#	Tool	Score	Tier	Category	Stars	Language
101	beling/bsuccinct-rs Rust libraries and programs focused on succinct data structures	52	Established	—	159	Rust
102	DataDog/orchestrion Automatic compile-time instrumentation of Go code	52	Established	—	579	Go
103	FriendsOfOpenTelemetry/opentelemetry-bundle Traces, metrics, and logs instrumentation within your Symfony application	52	Established	—	64	PHP
104	qwerty541/dns-bench Find the fastest DNS in your location to improve internet browsing experience.	52	Established	—	97	Rust
105	ldcsaa/hp-soa A fully functional, easy-to-use, and highly scalable microservice framework	51	Established	—	95	Java
106	tlog-dev/tlog Observability events system.	51	Established	—	18	Go
107	ecoAPM/BenchmarkMockNet Using BenchmarkDotNet to compare .NET mocking library performance	51	Established	—	24	C#
108	smarr/ReBenchDB ReBenchDB records benchmark results and provides customizable reporting to...	51	Established	—	18	TypeScript
109	vincentfree/opentelemetry Open Telemetry extensions	51	Established	—	24	Go
110	Point72/raydar A perspective powered, user editable ray dashboard via ray serve	51	Established	—	56	Python
111	quochuydev/dokploy-grafana-compose Docker Compose stack for Grafana observability: Tempo traces, Loki logs,...	50	Established	—	18	—
112	ROCm/madengine madengine is a streamlined CLI tool for running and benchmarking AI models...	50	Established	—	6	Python
113	nfrankel/opentelemetry-tracing Demo for end-to-end tracing via OpenTelemetry	50	Established	—	77	Kotlin
114	CodSpeedHQ/action Github Actions for running CodSpeed in your CI	50	Established	—	52	Shell
115	kieker-monitoring/moobench Micro-benchmarks for quantification of the performance overhead caused by...	50	Established	—	6	Shell
116	ipyflow/ipyflow A reactive Python kernel for Jupyter notebooks.	50	Established	—	1,265	Python
117	KaykCaputo/oracletrace Lightweight Python tool to detect performance regressions and compare...	49	Emerging	—	15	Python
118	RRZE-HPC/MachineState This CLI tool and Python3 module collects the current system state for documentation	48	Emerging	—	24	Python
119	dinesh-git17/claudehome An architectural persistence experiment for large language models. Claude’s...	48	Emerging	—	27	TypeScript
120	ivanfioravanti/llm_context_benchmarks 📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing...	48	Emerging	—	50	Python
121	facebookresearch/CUTracer A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel...	48	Emerging	—	53	Python
122	nyrkio/nyrkio Nyrkiö is an open source platform for detecting performance changes in a...	48	Emerging	—	65	Python
123	oteldb/oteldb OpenTelemetry signal storage	48	Emerging	—	68	Go
124	tw4452852/zbpf Writing eBPF in Zig	48	Emerging	—	259	Zig
125	JDiskMark/jdm-java Cross-platform Java Disk Benchmark Utility for measuring drive IO performance.	48	Emerging	—	4	Java
126	lucsorel/pydoctrace Generate architecture diagrams by tracing Python code execution	48	Emerging	—	17	Python
127	komoju/komoju-datadog Rust Datadog instrumentation	48	Emerging	—	4	Rust
128	mesaglio/otel-front Lightweight OpenTelemetry viewer for local development. Single binary, no...	47	Emerging	—	39	TypeScript
129	Helmholtz-AI-Energy/perun Perun is a Python package that measures the energy consumption of your applications.	47	Emerging	—	91	Python
130	containerscrew/nflux Simple network monitoring agent tool. Powered by eBPF & Rust 🐝	47	Emerging	—	9	Rust
131	blooop/bencher A package for benchmarking the characteristics of arbitrary functions	46	Emerging	—	4	Python
132	GabrielTecuceanu/httpress a fast HTTP benchmarking tool built in Rust	46	Emerging	—	10	Rust
133	DataDog/httpd-datadog Enhance Apache HTTPD Observability with Datadog's Module	46	Emerging	—	4	Python
134	proactive-agent/langgraphics Visualize live LangGraph execution and see how your agent thinks as it runs.	45	Emerging	—	88	TypeScript
135	CodSpeedHQ/instrument-hooks Internal core for the codspeed instruments	45	Emerging	—	2	C
136	kjldev/purview-telemetry-sourcegenerator .NET Source Generator for interface-based telemetry. Supporting activities,...	45	Emerging	—	30	C#
137	agurinov/gopl Golang platform library	45	Emerging	—	5	Go
138	grafana/otel-profiling-go Open Telemetry integration for Grafana Pyroscope and tracing solutions such...	45	Emerging	—	101	Go
139	Spectral-Knight-Ops/local-llm-evaluator Quickly test local LLMs with custom prompts to determine which model is best for you.	45	Emerging	—	8	Python
140	feelpp/benchmarking Feel++ Benchmarking	45	Emerging	—	3	Python
141	gstinoco/mGFD Meshless Generalized Finite Differences (mGFD) solver and reference...	44	Emerging	—	4	Python
142	shnarazk/SAT-bench A benchmark suit for SAT solvers	44	Emerging	—	2	Rust
143	uptrace/uptrace-ruby OpenTelemetry Ruby distribution for Uptrace	44	Emerging	—	3	Ruby
144	coralogix/coralogix-management-sdk API clients for configuring the Coralogix platform.	44	Emerging	—	4	Go
145	omniviser/omniray Stop guessing! You and your AI can now see live what's happening inside your...	43	Emerging	—	4	Python
146	HPE/torch-hammer Torch Hammer: Strike while the GPU is hot	43	Emerging	—	9	Python
147	typelevel/otel4s-sdk Implementation of the otel4s SDK modules in Scala from scratch	43	Emerging	—	5	Scala
148	falcondev-oss/workflow Simple type-safe queue worker with durable execution based on BullMQ.	42	Emerging	—	2	TypeScript
149	beorn/loggily TypeScript logger with debug-style namespaces, structured JSON, and...	42	Emerging	—	2	TypeScript
150	givecareapp/givecare-bench AI safety benchmark for long-term caregiving relationships. Tests crisis...	42	Emerging	—	2	Python
151	NyanKiyoshi/pytest-django-queries Generate performance reports from your django database performance tests.	42	Emerging	—	83	Python
152	pgx-contrib/pgxotel OpenTelemetry tracing instrumentation for pgx v5 — spans for queries,...	41	Emerging	—	8	Go
153	skerkour/go-benchmarks Comprehensive and reproducible benchmarks for Go developers and architects.	41	Emerging	—	13	Go
154	rsasaki0109/CloudAnalyzer CLI-first QA toolkit for point clouds, trajectories, and 3D perception...	41	Emerging	—	10	Python
155	MrAlias/flow An OpenTelemetry SpanProcessor reporting tracing flow metrics	41	Emerging	—	10	Go
156	udhos/opentelemetry-trace-sqs opentelemetry-trace-sqs propagates Open Telemetry tracing with SQS messages...	41	Emerging	—	8	Go
157	jamesgober/metrics-lib The fastest metrics library for Rust. Lock-free 0.6ns gauges, 18ns counters,...	41	Emerging	—	7	Rust
158	smyrgeorge/log4k A Comprehensive Logging and Tracing Solution for Kotlin Multiplatform.	40	Emerging	—	62	Kotlin
159	KempnerInstitute/nvidia-hpc-benchmarks NVIDIA HPC Benchmarks	40	Emerging	—	10	Shell
160	meshkovQA/Eval-ai-library Comprehensive AI Model Evaluation Framework with advanced techniques...	39	Emerging	—	31	Python
161	getaxonflow/axonflow AxonFlow: Runtime control layer for production AI	39	Emerging	—	43	Go
162	IBM/OpenDsStar OpenDsStar is an open-source implementation of the DS-Star agent that...	38	Emerging	—	15	Python
163	kobsio/kobs Kubernetes Observability Platform	37	Emerging	—	216	TypeScript
164	hdmsantander/microservices-ops-demo Spring Boot demo for observability, traceability and error analysis in a...	37	Emerging	—	4	Java
165	mbzuai-oryx/Agent-X ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric...	37	Emerging	—	39	Jupyter Notebook
166	evaluation-context-protocol/ecp ECP is a standardized interface for orchestrating, auditing, and enforcing...	37	Emerging	—	7	Python
167	verifywise-ai/plugin-marketplace VerifyWise AI Governance Plugin Marketplace	36	Emerging	—	3	TypeScript
168	braintrustdata/braintrust-pi-extension Braintrust tracing plugin for pi	36	Emerging	—	2	TypeScript
169	nixel2007/opentelemetry OpenTelemetry SDK для OneScript	36	Emerging	—	8	1C Enterprise
170	everythings-gonna-be-alright/phpScope PHP profiler that sends CPU sampling data to Pyroscope server.	36	Emerging	—	17	Go
171	opsrobot-ai/opsrobot Observability platform for OpenClaw agents, providing real-time tracing,...	35	Emerging	—	76	JavaScript
172	kolloch/reqray Log call tree summaries after each request for rust programs instrumented...	35	Emerging	—	45	Rust
173	tracewayapp/opentelemetry-symfony-bundle Pure-PHP OpenTelemetry instrumentation for Symfony - automatic HTTP,...	35	Emerging	—	57	PHP
174	PacificBiosciences/aardvark A tool for sniffing out the differences in vari-Ants	35	Emerging	—	40	Rust
175	yonatan-h/express-k6-profiler Finds bottlenecks in an Express app during load testing	34	Emerging	—	14	TypeScript
176	cuihairu/croupier Croupier is a universal GM (Game Master) backend system designed for game...	34	Emerging	—	13	Go
177	aykhans/sarin A high-performance HTTP load testing tool. Features dynamic request...	33	Emerging	—	7	Go
178	dolmen-go/flagx Extensions for the Go 'flag' package: flagx, flagfile, flagnet, flagtrace	32	Emerging	—	3	Go
179	MrAlias/collex Use OpenTelemetry Collector Factories to Export with OpenTelemetry Go	32	Emerging	—	3	Go
180	rodneylab/axum-graphql Rust GraphQL demo/test API written in Rust, using Axum for routing,...	31	Emerging	—	2	Rust
181	AmalChandru/termtrace A terminal workflow recorder that turns debugging sessions into replayable,...	31	Emerging	—	26	Go
182	last9/opentelemetry-examples Production-ready OpenTelemetry instrumentation examples for Go, Python,...	31	Emerging	—	3	Python
183	PAIR-Systems-Inc/little-dorrit-editor Multimodal benchmark for evaluating handwritten editorial correction in printed text.	31	Emerging	—	2	Python
184	filipsPL/optuml Optuna-optimized ML methods, with scikit-learn like API	31	Emerging	—	2	Python
185	BudEcosystem/bud-runtime Bud AI Foundry - A comprehensive inference stack for compound AI deployment,...	31	Emerging	—	2	Python
186	russfellows/sai3-bench A multi-protocol storage performance testing tool, inspired by vdbench, fio...	30	Emerging	—	2	Rust
187	hboublal/dopGuard Modular observability platform for .NET applications, integrating with tools...	30	Emerging	—	2	C#
188	imadAttar/spring-boot-unified-observability-starter All-in-one Spring Boot Starter for Observability: Metrics, Traces, Logs, and...	30	Emerging	—	6	Java
189	nshkrdotcom/AITrace The unified observability layer for the AI Control Plane	30	Emerging	—	2	Elixir
190	qcmet/qcmet Quantum Computing Metrics and Benchmarks	30	Emerging	—	5	Jupyter Notebook
191	tolitius/cupel discover LLMs punching above their weight	29	Experimental	—	28	JavaScript
192	wangyz1999/sync-video-label A web-based annotation tool for synchronized multi-video timeline labeling...	29	Experimental	—	17	TypeScript
193	iRevive/fs2-grpc-otel4s otel4s instrumentation for fs2-grpc	28	Experimental	—	2	Scala
194	mnemom/mnemom-platform Safe House for AI agents — transparent gateway with inbound + outbound...	28	Experimental	—	6	TypeScript
195	rvnhq/raven A lightweight, self-hostable cloud infrastructure monitoring and telemetry platform.	28	Experimental	—	5	Rust
196	DaSH-Lab-CSIS/blossom BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed...	27	Experimental	—	3	Python
197	kyahikaru/llm-guardrail-red-teaming Protocol constrained red teaming of frontier LLM guardrails in high risk...	27	Experimental	—	1	—
198	last9/rails-otel-context Tells you which code fired that query. Zero config.	27	Experimental	—	3	Ruby
199	thanhdaon/clean-arch-go Clean Architecture, DDD, CQRS with testings in Go	27	Experimental	—	19	Go
200	LLMSystems/BehaviorRL-Hallucination Learning When to Answer: Behavior-Oriented Reinforcement Learning for...	26	Experimental	—	7	Python

« Prev 1 2 3 Next »