JudgmentLabs/judgeval
The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
Provides OpenTelemetry-based tracing with `@Tracer.observe()` decorators that automatically capture LLM token usage and function I/O, plus asynchronous online evaluation via `Tracer.async_evaluate()` that scores production traffic server-side without latency impact. Supports hosted evaluators (faithfulness, relevancy) and custom `Judge` classes deployable to Firecracker microVMs, alongside dataset management and prompt versioning. Auto-instruments OpenAI, Anthropic, Google GenAI, and Together AI clients, with framework support for LangGraph, OpenLit, and Claude Agent SDK.
1,017 stars. Actively maintained with 27 commits in the last 30 days. Available on PyPI.
Stars
1,017
Forks
87
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
27
Dependencies
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/JudgmentLabs/judgeval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.