Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

/ 100

Verified

This tool helps AI application developers test and evaluate their Large Language Model (LLM) agents and applications. It allows you to define specific scenarios and checks to ensure your AI behaves correctly, even with varied, non-deterministic outputs. Data scientists, machine learning engineers, and AI product managers can use this to validate and improve the reliability of their LLM-powered systems.

5,158 stars. Actively maintained with 57 commits in the last 30 days.

Use this if you need to systematically test the responses of your LLM applications or AI agents, validate their quality, and ensure they adhere to safety guidelines.

Not ideal if you are looking for a general-purpose testing framework for traditional software, or if you need to evaluate non-LLM machine learning models.

LLM evaluation AI agent testing prompt engineering AI safety RAG systems

No Package No Dependents

Maintenance 25 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

5,158

Forks

406

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Compare

giskard-oss and MASEval

Related tools

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

EuroEval/EuroEval

The robust European language model benchmark.

evalplus/evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Explore LLM Tools

All categories Trending LLM Tool directory Insights