amitbad/llm-evaluation

Hands-on LLM evaluation learning repo — local models via Ollama, no paid APIs, no maths. Covers deterministic eval, LLM-as-a-Judge, hallucination testing, prompt injection, RAG evaluation, and agent trajectory scoring.

/ 100

Experimental

No License No Package No Dependents

Maintenance 13 / 25

Adoption 3 / 25

Maturity 1 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

HTML

License

—

Category

rag-evaluation-benchmarking

Last pushed

Mar 10, 2026

Commits (30d)

GitHub

RAG Evaluation Benchmarking · 32 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/amitbad/llm-evaluation"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

modelscope/evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation...

Kareem-Rashed/rubric-eval

Independent framework to test, benchmark, and evaluate LLMs & AI agents locally.

izam-mohammed/ragrank

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it...

justplus/llm-eval

大语言模型评估平台，支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。

relari-ai/continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

Explore RAG Tools

All categories Trending RAG directory Insights