Evaluation RAG Tools
There are 19 evaluation tools tracked. The highest-rated is TJ-Neary/AI_Eval at 24/100 with 0 stars.
Get all 19 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=evaluation&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
TJ-Neary/AI_Eval
Comprehensive LLM evaluation framework comparing local and cloud models with... |
|
Experimental |
| 2 |
masaakisakamoto/memory-os
Deterministic continuity for AI systems. Detect and repair inconsistencies... |
|
Experimental |
| 3 |
dahlinomine/local-llm-rag-bench
Python tool for benchmarking local LLM performance on specific RAG datasets. |
|
Experimental |
| 4 |
VectoringAI/ai-engineering
Practical tutorials to build AI Engineering skills |
|
Experimental |
| 5 |
priyanshus/evaliphy
E2E RAG Testing Tool |
|
Experimental |
| 6 |
moshe19909090/llm-evaluation-pipeline
End-to-end LLM evaluation pipeline with human and automated judging for... |
|
Experimental |
| 7 |
yosuancrespo/specforge-ai
AI-augmented QA platform for spec-driven development and testing,... |
|
Experimental |
| 8 |
hereandnowai/evaluation-of-opensource-llms-between-rag-and-finetuning-entreprise-grade
Enterprise-grade evaluation comparing RAG and Fine-Tuning for local... |
|
Experimental |
| 9 |
thecoderr13/Corrective-RAG
CRAG -A pipeline that uses tunable thresholds to validate document... |
|
Experimental |
| 10 |
xiaohanzhang2005/Minor-Detection
Self-evolving minor-user identification agent for anthropomorphic AI... |
|
Experimental |
| 11 |
ShabnamAtf/ScenarioBench
Trace-grounded compliance benchmark for Text-to-SQL and RAG |
|
Experimental |
| 12 |
dipakkr/ai-engineering-guide
A practical guide to AI engineering — LLMs, RAG, agents, evals, and... |
|
Experimental |
| 13 |
farithadnan/KB-AnswerScorer
A tool for evaluating LLM responses against a knowledge base of expert solutions. |
|
Experimental |
| 14 |
DennisMRitchie/go-llm-evaluator
LLM-as-a-Judge evaluation framework in Go |
|
Experimental |
| 15 |
Martonidaz/multi-agent-rag-builder
Desenvolvimento de um sistema multiagentes para auxiliar profissionais fora... |
|
Experimental |
| 16 |
emmeongoingammuaroi/reviewform
AI-Powered Code Review Agent built with LangGraph, FastAPI, MCP, and RAG... |
|
Experimental |
| 17 |
tovrr/Apex_LLM
Private AI workspace platform: FastAPI LLM API, streaming, evals, usage... |
|
Experimental |
| 18 |
songsunny00/ragas-dify-eval
使用Ragas 快速测评 Dify 应用(适合测评RAG应用) |
|
Experimental |
| 19 |
rickytang666/epa-consultant
🤖 RAG-powered regulatory intelligence for EPA pesticide compliance. |
|
Experimental |