Evaluation RAG Tools

There are 19 evaluation tools tracked. The highest-rated is TJ-Neary/AI_Eval at 24/100 with 0 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	TJ-Neary/AI_Eval Comprehensive LLM evaluation framework comparing local and cloud models with...	24	Experimental	—	Python
2	masaakisakamoto/memory-os Deterministic continuity for AI systems. Detect and repair inconsistencies...	23	Experimental	1	TypeScript
3	dahlinomine/local-llm-rag-bench Python tool for benchmarking local LLM performance on specific RAG datasets.	22	Experimental	—	—
4	VectoringAI/ai-engineering Practical tutorials to build AI Engineering skills	22	Experimental	—	Jupyter Notebook
5	priyanshus/evaliphy E2E RAG Testing Tool	22	Experimental	—	TypeScript
6	moshe19909090/llm-evaluation-pipeline End-to-end LLM evaluation pipeline with human and automated judging for...	22	Experimental	—	Jupyter Notebook
7	yosuancrespo/specforge-ai AI-augmented QA platform for spec-driven development and testing,...	22	Experimental	—	Python
8	hereandnowai/evaluation-of-opensource-llms-between-rag-and-finetuning-entreprise-grade Enterprise-grade evaluation comparing RAG and Fine-Tuning for local...	21	Experimental	—	Python
9	thecoderr13/Corrective-RAG CRAG -A pipeline that uses tunable thresholds to validate document...	21	Experimental	—	Python
10	xiaohanzhang2005/Minor-Detection Self-evolving minor-user identification agent for anthropomorphic AI...	19	Experimental	13	Python
11	ShabnamAtf/ScenarioBench Trace-grounded compliance benchmark for Text-to-SQL and RAG	17	Experimental	—	Python
12	dipakkr/ai-engineering-guide A practical guide to AI engineering — LLMs, RAG, agents, evals, and...	17	Experimental	4	Python
13	farithadnan/KB-AnswerScorer A tool for evaluating LLM responses against a knowledge base of expert solutions.	15	Experimental	—	Python
14	DennisMRitchie/go-llm-evaluator LLM-as-a-Judge evaluation framework in Go	14	Experimental	—	Go
15	Martonidaz/multi-agent-rag-builder Desenvolvimento de um sistema multiagentes para auxiliar profissionais fora...	14	Experimental	—	Jupyter Notebook
16	emmeongoingammuaroi/reviewform AI-Powered Code Review Agent built with LangGraph, FastAPI, MCP, and RAG...	14	Experimental	—	Python
17	tovrr/Apex_LLM Private AI workspace platform: FastAPI LLM API, streaming, evals, usage...	14	Experimental	—	Python
18	songsunny00/ragas-dify-eval 使用Ragas 快速测评 Dify 应用（适合测评RAG应用）	14	Experimental	1	Python
19	rickytang666/epa-consultant 🤖 RAG-powered regulatory intelligence for EPA pesticide compliance.	13	Experimental	—	Python