vectara/open-rag-eval
RAG evaluation without the need for "golden answers"
Implements reference-free evaluation metrics (UMBRELA, AutoNuggetizer) based on research from UWaterloo, eliminating the need for golden answers while supporting optional reference-based metrics when available. Provides modular connectors for Vectara, LlamaIndex, and LangChain RAG platforms, with built-in TREC-RAG benchmark metrics and per-query scoring for detailed analysis. Uses LLM judges and open-source hallucination detection models (HHEM) to assess retrieval quality and factual consistency across RAG pipelines.
347 stars and 645 monthly downloads. Available on PyPI.
Stars
347
Forks
21
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 15, 2025
Monthly downloads
645
Commits (30d)
0
Dependencies
28
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/vectara/open-rag-eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Compare
Related tools
HZYAI/RagScore
⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...
DocAILab/XRAG
XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...
AIAnytime/rag-evaluator
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
microsoft/benchmark-qed
Automated benchmarking of Retrieval-Augmented Generation (RAG) systems
2501Pr0ject/RAGnarok-AI
Local-first RAG evaluation framework for LLM applications. 100% local, no API keys required.