open-rag-eval and RAG-evaluation-harnesses

These are complements: open-rag-eval provides a framework for evaluating RAG systems without requiring reference answers, while RAG-evaluation-harnesses offers a comprehensive evaluation suite that could incorporate or be used alongside different evaluation methodologies, allowing practitioners to combine multiple evaluation approaches for more robust assessment.

open-rag-eval

Established

RAG-evaluation-harnesses

Experimental

Maintenance 6/25

Adoption 16/25

Maturity 18/25

Community 12/25

Maintenance 2/25

Adoption 6/25

Maturity 9/25

Community 11/25

Stars: 347

Forks: 21

Downloads: 645

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 23

Forks: 3

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No risk flags

Stale 6m No Package No Dependents

About open-rag-eval

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

Implements reference-free evaluation metrics (UMBRELA, AutoNuggetizer) based on research from UWaterloo, eliminating the need for golden answers while supporting optional reference-based metrics when available. Provides modular connectors for Vectara, LlamaIndex, and LangChain RAG platforms, with built-in TREC-RAG benchmark metrics and per-query scoring for detailed analysis. Uses LLM judges and open-source hallucination detection models (HHEM) to assess retrieval quality and factual consistency across RAG pipelines.

About RAG-evaluation-harnesses

RulinShao/RAG-evaluation-harnesses

An evaluation suite for Retrieval-Augmented Generation (RAG).

Related comparisons

open-rag-eval and rag-evaluator open-rag-eval and rageval open-rag-eval and rag-evaluator open-rag-eval and rageval

Scores updated daily from GitHub, PyPI, and npm data. How scores work