vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

/ 100

Verified

This tool helps AI engineers and product managers objectively assess the quality of their Large Language Model (LLM) applications. It takes your LLM application's outputs and evaluates them using a set of pre-defined metrics or custom criteria. The result is a clear, data-driven score and feedback, allowing you to identify weaknesses and improve your AI's performance.

12,927 stars. Used by 6 other packages. Available on PyPI.

Use this if you are building or managing an LLM application and need to systematically measure its effectiveness and generate comprehensive test data without subjective manual reviews.

Not ideal if you are looking for a general-purpose analytics tool for traditional software or only need qualitative, human-in-the-loop feedback for your AI outputs.

LLM-evaluation AI-product-management machine-learning-operations AI-testing RAG-systems

Maintenance 10 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

12,927

Forks

1,294

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Related tools

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

EuroEval/EuroEval

The robust European language model benchmark.

evalplus/evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Explore LLM Tools

All categories Trending LLM Tool directory Insights