Arize-ai/phoenix

AI Observability & Evaluation

/ 100

Verified

Provides OpenTelemetry-based tracing, LLM-powered evaluation, versioned datasets, and experiment tracking across LLM frameworks (LangGraph, LlamaIndex, Claude/OpenAI agent SDKs) and providers. Features a web UI with prompt optimization playground, dataset management, and call replay capabilities. Runs locally, in notebooks, or containerized with Helm support, and integrates via auto-instrumentation through the OpenInference standard.

8,847 stars and 1,013,605 monthly downloads. Used by 7 other packages. Actively maintained with 330 commits in the last 30 days. Available on PyPI.

Maintenance 25 / 25

Adoption 25 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

8,847

Forks

753

Language

Jupyter Notebook

License

—

Compare

phoenix and langfuse phoenix and helicone phoenix and agenta phoenix and langtrace phoenix and langwatch phoenix and brokle phoenix and openinspector

Related tools

langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management,...

Mirascope/mirascope

The LLM Anti-Framework

Helicone/helicone

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

Agenta-AI/agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM...

algorithmicsuperintelligence/optillm

Optimizing inference proxy for LLMs

Explore Prompt Engineering Tools

All categories Trending Prompt Engineering directory Insights