JudgmentLabs/judgeval

The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

/ 100

Established

Provides OpenTelemetry-based tracing with `@Tracer.observe()` decorators that automatically capture LLM token usage and function I/O, plus asynchronous online evaluation via `Tracer.async_evaluate()` that scores production traffic server-side without latency impact. Supports hosted evaluators (faithfulness, relevancy) and custom `Judge` classes deployable to Firecracker microVMs, alongside dataset management and prompt versioning. Auto-instruments OpenAI, Anthropic, Google GenAI, and Together AI clients, with framework support for LangGraph, OpenLit, and Claude Agent SDK.

1,017 stars. Actively maintained with 27 commits in the last 30 days. Available on PyPI.

Maintenance 23 / 25

Adoption 10 / 25

Maturity 18 / 25

Community 17 / 25

How are scores calculated?

Stars

1,017

Forks

Language

Python

License

Apache-2.0

Category

llm-observability-platforms

Last pushed

Mar 13, 2026

Commits (30d)

Dependencies

GitHub PyPI

Llm Observability Platforms · 1 agents

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/JudgmentLabs/judgeval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Explore AI Agents

All categories Trending AI Agent directory Insights