open-rag-eval and RAG-evaluation-harnesses

These are complements: open-rag-eval provides a framework for evaluating RAG systems without requiring reference answers, while RAG-evaluation-harnesses offers a comprehensive evaluation suite that could incorporate or be used alongside different evaluation methodologies, allowing practitioners to combine multiple evaluation approaches for more robust assessment.

open-rag-eval
52
Established
RAG-evaluation-harnesses
28
Experimental
Maintenance 6/25
Adoption 16/25
Maturity 18/25
Community 12/25
Maintenance 2/25
Adoption 6/25
Maturity 9/25
Community 11/25
Stars: 347
Forks: 21
Downloads: 645
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 23
Forks: 3
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No risk flags
Stale 6m No Package No Dependents

About open-rag-eval

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

Implements reference-free evaluation metrics (UMBRELA, AutoNuggetizer) based on research from UWaterloo, eliminating the need for golden answers while supporting optional reference-based metrics when available. Provides modular connectors for Vectara, LlamaIndex, and LangChain RAG platforms, with built-in TREC-RAG benchmark metrics and per-query scoring for detailed analysis. Uses LLM judges and open-source hallucination detection models (HHEM) to assess retrieval quality and factual consistency across RAG pipelines.

About RAG-evaluation-harnesses

RulinShao/RAG-evaluation-harnesses

An evaluation suite for Retrieval-Augmented Generation (RAG).

Scores updated daily from GitHub, PyPI, and npm data. How scores work