TonicAI/tonic_validate

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

/ 100

Emerging

Provides 15+ built-in metrics (answer relevance, hallucination detection, context precision) with configurable LLM backends (OpenAI, Anthropic, etc.) and supports custom metric implementations. Operates as a Python framework that processes question-answer-context triplets through chainable metric evaluators, with optional cloud integration for result visualization and tracking. Integrates with CI/CD pipelines via GitHub Actions and pairs with Tonic Textual for preprocessing unstructured data into RAG-optimized formats.

324 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

324

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

Explore RAG Tools

All categories Trending RAG directory Insights