HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

/ 100

Established

Supports both local (Ollama) and cloud LLM providers with structured JSON-based QA generation, enabling fully private evaluation for sensitive domains. Multi-metric evaluation breaks down RAG performance across five dimensions—correctness, completeness, relevance, conciseness, and faithfulness—all computed in a single LLM call, with audience-targeted QA generation to tailor assessments for specific user groups (developers, customers, auditors). Works as a Python API, CLI tool, or MCP server with async processing and integrates with any RAG endpoint via HTTP.

30 stars and 1,052 monthly downloads. Used by 1 other package. Available on PyPI.

Maintenance 13 / 25

Adoption 15 / 25

Maturity 18 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Related tools

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

2501Pr0ject/RAGnarok-AI

Local-first RAG evaluation framework for LLM applications. 100% local, no API keys required.

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

Explore RAG Tools

All categories Trending RAG directory Insights