All AI Evaluation Tools
216 tools ranked by quality score · Page 3 of 3
| # | Tool | Score | Tier |
|---|---|---|---|
| 201 |
maxi4youuu/RePRo
🧠 Enhance raw prompts into optimized, powerful versions for AI tools like... |
|
Experimental |
| 202 |
Anarv2104/Inflion
Observability and influence tracing infrastructure for multi-agent AI systems. |
|
Experimental |
| 203 |
HiThink-Research/FinMTM
[ACL 2026] FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning... |
|
Experimental |
| 204 |
fourdollars/cella
A terminal UI and CLI for managing and monitoring LXD + Docker containers —... |
|
Experimental |
| 205 |
FelixBroesamle/s2mflow
Meta-generator: generating multicommodity flow instances from... |
|
Experimental |
| 206 |
iazaran/trace-replay
High-fidelity process tracking, deterministic replay, and AI-powered... |
|
Experimental |
| 207 |
Basaltlabs-app/Gauntlet
Community-driven behavioral reliability benchmark for LLMs. 88 probes across... |
|
Experimental |
| 208 |
SagarMaheshwary/reqlog
Fast CLI to search and trace logs across services or single files using... |
|
Experimental |
| 209 |
TomasVenkrbec/lazyline
Zero-config line-level Python profiler. No decorators, no code changes.... |
|
Experimental |
| 210 |
0xMilord/better-logger
Execution flow debugger for modern apps. Turn scattered `console.log` calls... |
|
Experimental |
| 211 |
vikpant/strategic-coopetition
Coopetition-Gym: A research-grade mixed-motive multi-agent reinforcement... |
|
Experimental |
| 212 |
bajajku/VAC
Develop and evaluate a trauma-informed LLM-based chatbot that is... |
|
Experimental |
| 213 |
parsamivehchi/tps.sh
tps.sh — Tokens Per Second LLM Benchmark. 7 models, 147 tests, 21 prompts... |
|
Experimental |
| 214 |
Zxela/claude-monitor
Real-time dashboard for monitoring Claude Code sessions — live token usage,... |
|
Experimental |
| 215 |
pilhuhn/otel-oql
An experiment in creating a OpenTelemetry backend |
|
Experimental |
| 216 |
MarkIvor/officeiq
Исследовательский вопрос: можно ли измерить «офисный интеллект» LLM? Попытка... |
|
Experimental |