open-rag-eval and rag-evaluator

These are competitors offering different evaluation methodologies: Vectara's framework enables reference-free evaluation using LLMs to assess RAG quality directly, while AIAnytime's library implements traditional evaluation requiring ground-truth golden answers for comparison.

open-rag-eval
59
Established
rag-evaluator
56
Established
Maintenance 6/25
Adoption 16/25
Maturity 25/25
Community 12/25
Maintenance 0/25
Adoption 12/25
Maturity 25/25
Community 19/25
Stars: 347
Forks: 21
Downloads: 645
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 42
Forks: 18
Downloads: 65
Commits (30d): 0
Language: Python
License: MIT
No risk flags
Stale 6m

About open-rag-eval

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

Implements reference-free evaluation metrics (UMBRELA, AutoNuggetizer) based on research from UWaterloo, eliminating the need for golden answers while supporting optional reference-based metrics when available. Provides modular connectors for Vectara, LlamaIndex, and LangChain RAG platforms, with built-in TREC-RAG benchmark metrics and per-query scoring for detailed analysis. Uses LLM judges and open-source hallucination detection models (HHEM) to assess retrieval quality and factual consistency across RAG pipelines.

About rag-evaluator

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

Computes eleven evaluation metrics including BLEU, ROUGE, BERT Score, METEOR, and MAUVE to assess generated responses across semantic similarity, fluency, readability, and bias dimensions. Provides both a Python API for programmatic evaluation and a Streamlit web interface for interactive analysis. Designed for end-to-end RAG pipeline assessment without requiring external model APIs.

Scores updated daily from GitHub, PyPI, and npm data. How scores work