Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

70
/ 100
Verified

This tool helps AI application developers test and evaluate their Large Language Model (LLM) agents and applications. It allows you to define specific scenarios and checks to ensure your AI behaves correctly, even with varied, non-deterministic outputs. Data scientists, machine learning engineers, and AI product managers can use this to validate and improve the reliability of their LLM-powered systems.

5,158 stars. Actively maintained with 57 commits in the last 30 days.

Use this if you need to systematically test the responses of your LLM applications or AI agents, validate their quality, and ensure they adhere to safety guidelines.

Not ideal if you are looking for a general-purpose testing framework for traditional software, or if you need to evaluate non-LLM machine learning models.

LLM evaluation AI agent testing prompt engineering AI safety RAG systems
No Package No Dependents
Maintenance 25 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

5,158

Forks

406

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

57

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Giskard-AI/giskard-oss"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.