rag-evaluator and rageval

These two tools are competitors, as both provide libraries for evaluating Retrieval-Augmented Generation (RAG) systems.

rag-evaluator

Established

rageval

Emerging

Maintenance 0/25

Adoption 12/25

Maturity 25/25

Community 19/25

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 10/25

Stars: 42

Forks: 18

Downloads: 65

Commits (30d): 0

Language: Python

License: MIT

Stars: 170

Forks: 10

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stale 6m

Stale 6m No Package No Dependents

About rag-evaluator

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

Computes eleven evaluation metrics including BLEU, ROUGE, BERT Score, METEOR, and MAUVE to assess generated responses across semantic similarity, fluency, readability, and bias dimensions. Provides both a Python API for programmatic evaluation and a Streamlit web interface for interactive analysis. Designed for end-to-end RAG pipeline assessment without requiring external model APIs.

About rageval

gomate-community/rageval

Evaluation tools for Retrieval-augmented Generation (RAG) methods.

Provides modular evaluation across six RAG pipeline stages—query rewriting, retrieval, compression, evidence verification, generation, and validation—with 30+ metrics spanning answer correctness (F1, ROUGE, EM), groundedness (citation precision/recall), and context adequacy. Supports both LLM-based and string-matching evaluators, with pluggable integrations for OpenAI APIs or open-source models via vllm. Includes benchmark implementations on ASQA and other QA datasets with reproducible evaluation scripts.

Related comparisons

rag-evaluator and open-rag-eval rag-evaluator and RAG-evaluation-harnesses rag-evaluator and open-rag-eval rag-evaluator and RAG-evaluation-harnesses

Scores updated daily from GitHub, PyPI, and npm data. How scores work