open-rag-eval and RAG-evaluation-harnesses
These are complements: open-rag-eval provides a framework for evaluating RAG systems without requiring reference answers, while RAG-evaluation-harnesses offers a comprehensive evaluation suite that could incorporate or be used alongside different evaluation methodologies, allowing practitioners to combine multiple evaluation approaches for more robust assessment.
About open-rag-eval
vectara/open-rag-eval
RAG evaluation without the need for "golden answers"
Implements reference-free evaluation metrics (UMBRELA, AutoNuggetizer) based on research from UWaterloo, eliminating the need for golden answers while supporting optional reference-based metrics when available. Provides modular connectors for Vectara, LlamaIndex, and LangChain RAG platforms, with built-in TREC-RAG benchmark metrics and per-query scoring for detailed analysis. Uses LLM judges and open-source hallucination detection models (HHEM) to assess retrieval quality and factual consistency across RAG pipelines.
About RAG-evaluation-harnesses
RulinShao/RAG-evaluation-harnesses
An evaluation suite for Retrieval-Augmented Generation (RAG).
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work