AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

/ 100

Emerging

Computes eleven evaluation metrics including BLEU, ROUGE, BERT Score, METEOR, and MAUVE to assess generated responses across semantic similarity, fluency, readability, and bias dimensions. Provides both a Python API for programmatic evaluation and a Streamlit web interface for interactive analysis. Designed for end-to-end RAG pipeline assessment without requiring external model APIs.

No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 12 / 25

Maturity 18 / 25

Community 19 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Compare

rag-evaluator and open-rag-eval rag-evaluator and rageval rag-evaluator and RAG-evaluation-harnesses

Higher-rated alternatives

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

2501Pr0ject/RAGnarok-AI

Local-first RAG evaluation framework for LLM applications. 100% local, no API keys required.

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

Explore RAG Tools

All categories Trending RAG directory Insights