JudgmentLabs/judgeval

The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

68
/ 100
Established

Provides OpenTelemetry-based tracing with `@Tracer.observe()` decorators that automatically capture LLM token usage and function I/O, plus asynchronous online evaluation via `Tracer.async_evaluate()` that scores production traffic server-side without latency impact. Supports hosted evaluators (faithfulness, relevancy) and custom `Judge` classes deployable to Firecracker microVMs, alongside dataset management and prompt versioning. Auto-instruments OpenAI, Anthropic, Google GenAI, and Together AI clients, with framework support for LangGraph, OpenLit, and Claude Agent SDK.

1,017 stars. Actively maintained with 27 commits in the last 30 days. Available on PyPI.

Maintenance 23 / 25
Adoption 10 / 25
Maturity 18 / 25
Community 17 / 25

How are scores calculated?

Stars

1,017

Forks

87

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

27

Dependencies

6

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/JudgmentLabs/judgeval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.