Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents
This tool helps AI application developers test and evaluate their Large Language Model (LLM) agents and applications. It allows you to define specific scenarios and checks to ensure your AI behaves correctly, even with varied, non-deterministic outputs. Data scientists, machine learning engineers, and AI product managers can use this to validate and improve the reliability of their LLM-powered systems.
5,158 stars. Actively maintained with 57 commits in the last 30 days.
Use this if you need to systematically test the responses of your LLM applications or AI agents, validate their quality, and ensure they adhere to safety guidelines.
Not ideal if you are looking for a general-purpose testing framework for traditional software, or if you need to evaluate non-LLM machine learning models.
Stars
5,158
Forks
406
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
57
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Giskard-AI/giskard-oss"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Compare
Related tools
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
EuroEval/EuroEval
The robust European language model benchmark.
evalplus/evalplus
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024