Evaluation RAG Tools

There are 19 evaluation tools tracked. The highest-rated is TJ-Neary/AI_Eval at 24/100 with 0 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 TJ-Neary/AI_Eval

Comprehensive LLM evaluation framework comparing local and cloud models with...

24
Experimental
2 masaakisakamoto/memory-os

Deterministic continuity for AI systems. Detect and repair inconsistencies...

23
Experimental
3 dahlinomine/local-llm-rag-bench

Python tool for benchmarking local LLM performance on specific RAG datasets.

22
Experimental
4 VectoringAI/ai-engineering

Practical tutorials to build AI Engineering skills

22
Experimental
5 priyanshus/evaliphy

E2E RAG Testing Tool

22
Experimental
6 moshe19909090/llm-evaluation-pipeline

End-to-end LLM evaluation pipeline with human and automated judging for...

22
Experimental
7 yosuancrespo/specforge-ai

AI-augmented QA platform for spec-driven development and testing,...

22
Experimental
8 hereandnowai/evaluation-of-opensource-llms-between-rag-and-finetuning-entreprise-grade

Enterprise-grade evaluation comparing RAG and Fine-Tuning for local...

21
Experimental
9 thecoderr13/Corrective-RAG

CRAG -A pipeline that uses tunable thresholds to validate document...

21
Experimental
10 xiaohanzhang2005/Minor-Detection

Self-evolving minor-user identification agent for anthropomorphic AI...

19
Experimental
11 ShabnamAtf/ScenarioBench

Trace-grounded compliance benchmark for Text-to-SQL and RAG

17
Experimental
12 dipakkr/ai-engineering-guide

A practical guide to AI engineering — LLMs, RAG, agents, evals, and...

17
Experimental
13 farithadnan/KB-AnswerScorer

A tool for evaluating LLM responses against a knowledge base of expert solutions.

15
Experimental
14 DennisMRitchie/go-llm-evaluator

LLM-as-a-Judge evaluation framework in Go

14
Experimental
15 Martonidaz/multi-agent-rag-builder

Desenvolvimento de um sistema multiagentes para auxiliar profissionais fora...

14
Experimental
16 emmeongoingammuaroi/reviewform

AI-Powered Code Review Agent built with LangGraph, FastAPI, MCP, and RAG...

14
Experimental
17 tovrr/Apex_LLM

Private AI workspace platform: FastAPI LLM API, streaming, evals, usage...

14
Experimental
18 songsunny00/ragas-dify-eval

使用Ragas 快速测评 Dify 应用(适合测评RAG应用)

14
Experimental
19 rickytang666/epa-consultant

🤖 RAG-powered regulatory intelligence for EPA pesticide compliance.

13
Experimental