2501Pr0ject/RAGnarok-AI
Local-first RAG evaluation framework for LLM applications. 100% local, no API keys required.
Supports LangChain, LangGraph, LlamaIndex, and custom RAG pipelines with built-in checkpointing for resumable evaluations. Uses local LLM-as-Judge scoring (faithfulness, relevance, hallucination) with Prometheus metrics and Ollama integration, achieving ~24K retrieval queries/sec while maintaining minimal dependencies. Includes CLI-first design, CI/CD GitHub Actions, and Kubernetes deployment support for production monitoring and cost tracking.
Available on PyPI.
Stars
13
Forks
2
Language
Python
License
AGPL-3.0
Category
Last pushed
Feb 28, 2026
Monthly downloads
308
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/2501Pr0ject/RAGnarok-AI"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
HZYAI/RagScore
⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...
vectara/open-rag-eval
RAG evaluation without the need for "golden answers"
DocAILab/XRAG
XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...
AIAnytime/rag-evaluator
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
microsoft/benchmark-qed
Automated benchmarking of Retrieval-Augmented Generation (RAG) systems