rag-evaluator and rageval

These two tools are competitors, as both provide libraries for evaluating Retrieval-Augmented Generation (RAG) systems.

rag-evaluator
56
Established
rageval
36
Emerging
Maintenance 0/25
Adoption 12/25
Maturity 25/25
Community 19/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 10/25
Stars: 42
Forks: 18
Downloads: 65
Commits (30d): 0
Language: Python
License: MIT
Stars: 170
Forks: 10
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stale 6m
Stale 6m No Package No Dependents

About rag-evaluator

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

Computes eleven evaluation metrics including BLEU, ROUGE, BERT Score, METEOR, and MAUVE to assess generated responses across semantic similarity, fluency, readability, and bias dimensions. Provides both a Python API for programmatic evaluation and a Streamlit web interface for interactive analysis. Designed for end-to-end RAG pipeline assessment without requiring external model APIs.

About rageval

gomate-community/rageval

Evaluation tools for Retrieval-augmented Generation (RAG) methods.

Provides modular evaluation across six RAG pipeline stages—query rewriting, retrieval, compression, evidence verification, generation, and validation—with 30+ metrics spanning answer correctness (F1, ROUGE, EM), groundedness (citation precision/recall), and context adequacy. Supports both LLM-based and string-matching evaluators, with pluggable integrations for OpenAI APIs or open-source models via vllm. Includes benchmark implementations on ASQA and other QA datasets with reproducible evaluation scripts.

Scores updated daily from GitHub, PyPI, and npm data. How scores work