2501Pr0ject/RAGnarok-AI

Local-first RAG evaluation framework for LLM applications. 100% local, no API keys required.

/ 100

Established

Supports LangChain, LangGraph, LlamaIndex, and custom RAG pipelines with built-in checkpointing for resumable evaluations. Uses local LLM-as-Judge scoring (faithfulness, relevance, hallucination) with Prometheus metrics and Ollama integration, achieving ~24K retrieval queries/sec while maintaining minimal dependencies. Includes CLI-first design, CI/CD GitHub Actions, and Kubernetes deployment support for production monitoring and cost tracking.

Available on PyPI.

Maintenance 10 / 25

Adoption 11 / 25

Maturity 18 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

AGPL-3.0

Featured in

You're Shipping AI You Can't Measure

Related tools

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

Explore RAG Tools

All categories Trending RAG directory Insights