nuclia/nuclia-eval

Library for evaluating RAG using Nuclia's models

/ 100

Emerging

Provides fine-grained RAG evaluation across three dimensions—answer relevance, context relevance, and groundedness—using REMi-v0, a LoRA adapter built on Mistral-7B that returns both scalar scores (0-5) and reasoning explanations. Metrics can be evaluated together or individually, with strict scoring that detects factual inconsistencies and relevance mismatches. Requires HuggingFace authentication and a 24GB+ GPU, with configurable model caching.

No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 9 / 25

Maturity 18 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

Explore RAG Tools

All categories Trending RAG directory Insights