open-rag-eval and rageval

These are competitors: open-rag-eval provides reference-free evaluation metrics suitable for production RAG systems, while rageval appears to be a lighter-weight evaluation toolkit, making them alternative choices for the same use case rather than tools designed to work together.

open-rag-eval
59
Established
rageval
36
Emerging
Maintenance 6/25
Adoption 16/25
Maturity 25/25
Community 12/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 10/25
Stars: 347
Forks: 21
Downloads: 645
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 170
Forks: 10
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
Stale 6m No Package No Dependents

About open-rag-eval

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

Implements reference-free evaluation metrics (UMBRELA, AutoNuggetizer) based on research from UWaterloo, eliminating the need for golden answers while supporting optional reference-based metrics when available. Provides modular connectors for Vectara, LlamaIndex, and LangChain RAG platforms, with built-in TREC-RAG benchmark metrics and per-query scoring for detailed analysis. Uses LLM judges and open-source hallucination detection models (HHEM) to assess retrieval quality and factual consistency across RAG pipelines.

About rageval

gomate-community/rageval

Evaluation tools for Retrieval-augmented Generation (RAG) methods.

Provides modular evaluation across six RAG pipeline stages—query rewriting, retrieval, compression, evidence verification, generation, and validation—with 30+ metrics spanning answer correctness (F1, ROUGE, EM), groundedness (citation precision/recall), and context adequacy. Supports both LLM-based and string-matching evaluators, with pluggable integrations for OpenAI APIs or open-source models via vllm. Includes benchmark implementations on ASQA and other QA datasets with reproducible evaluation scripts.

Scores updated daily from GitHub, PyPI, and npm data. How scores work