langfuse and phoenix

These are competitors offering overlapping LLM observability and evaluation capabilities, though Langfuse provides additional features like prompt management and playground while Phoenix focuses more narrowly on observability and evals.

langfuse

Verified

phoenix

Verified

Maintenance 25/25

Adoption 25/25

Maturity 25/25

Community 20/25

Maintenance 25/25

Adoption 25/25

Maturity 25/25

Community 19/25

Stars: 23,106

Forks: 2,333

Downloads: 3,912,905

Commits (30d): 240

Language: TypeScript

License: —

Stars: 8,847

Forks: 753

Downloads: 1,013,605

Commits (30d): 330

Language: Jupyter Notebook

License: —

No risk flags

About langfuse

langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

Provides distributed tracing via SDKs (Python, JavaScript/TypeScript) that capture full LLM call chains with automatic context propagation, backed by ClickHouse for scalable analytics. Features a unified API surface for programmatic access to traces, evaluations, and datasets, enabling custom workflows and integration into existing MLOps pipelines alongside LangChain, LlamaIndex, and other frameworks.

About phoenix

Arize-ai/phoenix

AI Observability & Evaluation

Provides OpenTelemetry-based tracing, LLM-powered evaluation, versioned datasets, and experiment tracking across LLM frameworks (LangGraph, LlamaIndex, Claude/OpenAI agent SDKs) and providers. Features a web UI with prompt optimization playground, dataset management, and call replay capabilities. Runs locally, in notebooks, or containerized with Helm support, and integrates via auto-instrumentation through the OpenInference standard.

Related comparisons

langfuse and helicone langfuse and agenta langfuse and LLMstudio langfuse and langtrace langfuse and langfuse-java langfuse and langkit

Scores updated daily from GitHub, PyPI, and npm data. How scores work