open-rag-eval and rageval
These are competitors: open-rag-eval provides reference-free evaluation metrics suitable for production RAG systems, while rageval appears to be a lighter-weight evaluation toolkit, making them alternative choices for the same use case rather than tools designed to work together.
About open-rag-eval
vectara/open-rag-eval
RAG evaluation without the need for "golden answers"
Implements reference-free evaluation metrics (UMBRELA, AutoNuggetizer) based on research from UWaterloo, eliminating the need for golden answers while supporting optional reference-based metrics when available. Provides modular connectors for Vectara, LlamaIndex, and LangChain RAG platforms, with built-in TREC-RAG benchmark metrics and per-query scoring for detailed analysis. Uses LLM judges and open-source hallucination detection models (HHEM) to assess retrieval quality and factual consistency across RAG pipelines.
About rageval
gomate-community/rageval
Evaluation tools for Retrieval-augmented Generation (RAG) methods.
Provides modular evaluation across six RAG pipeline stages—query rewriting, retrieval, compression, evidence verification, generation, and validation—with 30+ metrics spanning answer correctness (F1, ROUGE, EM), groundedness (citation precision/recall), and context adequacy. Supports both LLM-based and string-matching evaluators, with pluggable integrations for OpenAI APIs or open-source models via vllm. Includes benchmark implementations on ASQA and other QA datasets with reproducible evaluation scripts.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work