nuclia/nuclia-eval
Library for evaluating RAG using Nuclia's models
Provides fine-grained RAG evaluation across three dimensions—answer relevance, context relevance, and groundedness—using REMi-v0, a LoRA adapter built on Mistral-7B that returns both scalar scores (0-5) and reasoning explanations. Metrics can be evaluated together or individually, with strict scoring that detects factual inconsistencies and relevance mismatches. Requires HuggingFace authentication and a 24GB+ GPU, with configurable model caching.
No commits in the last 6 months. Available on PyPI.
Stars
18
Forks
3
Language
Python
License
MIT
Category
Last pushed
Jul 31, 2024
Monthly downloads
15
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/nuclia/nuclia-eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
HZYAI/RagScore
⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...
vectara/open-rag-eval
RAG evaluation without the need for "golden answers"
DocAILab/XRAG
XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...
AIAnytime/rag-evaluator
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
microsoft/benchmark-qed
Automated benchmarking of Retrieval-Augmented Generation (RAG) systems