vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

/ 100

Established

Implements reference-free evaluation metrics (UMBRELA, AutoNuggetizer) based on research from UWaterloo, eliminating the need for golden answers while supporting optional reference-based metrics when available. Provides modular connectors for Vectara, LlamaIndex, and LangChain RAG platforms, with built-in TREC-RAG benchmark metrics and per-query scoring for detailed analysis. Uses LLM judges and open-source hallucination detection models (HHEM) to assess retrieval quality and factual consistency across RAG pipelines.

347 stars and 645 monthly downloads. Available on PyPI.

Maintenance 6 / 25

Adoption 16 / 25

Maturity 25 / 25

Community 12 / 25

How are scores calculated?

Stars

347

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Compare

open-rag-eval and rag-evaluator open-rag-eval and rageval open-rag-eval and RAG-evaluation-harnesses

Related tools

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

2501Pr0ject/RAGnarok-AI

Local-first RAG evaluation framework for LLM applications. 100% local, no API keys required.

Explore RAG Tools

All categories Trending RAG directory Insights