phoenix and helicone

These are **competitors** offering overlapping core functionality—both provide end-to-end LLM observability with logging, monitoring, and evaluation capabilities—though Phoenix has significantly broader adoption (1M+ monthly downloads vs. 346) and a more mature feature set.

phoenix

Verified

helicone

Verified

Maintenance 25/25

Adoption 25/25

Maturity 25/25

Community 19/25

Maintenance 20/25

Adoption 16/25

Maturity 25/25

Community 20/25

Stars: 8,847

Forks: 753

Downloads: 1,013,605

Commits (30d): 330

Language: Jupyter Notebook

License: —

Stars: 5,237

Forks: 494

Downloads: 292

Commits (30d): 7

Language: TypeScript

License: Apache-2.0

No risk flags

About phoenix

Arize-ai/phoenix

AI Observability & Evaluation

Provides OpenTelemetry-based tracing, LLM-powered evaluation, versioned datasets, and experiment tracking across LLM frameworks (LangGraph, LlamaIndex, Claude/OpenAI agent SDKs) and providers. Features a web UI with prompt optimization playground, dataset management, and call replay capabilities. Runs locally, in notebooks, or containerized with Helm support, and integrates via auto-instrumentation through the OpenInference standard.

About helicone

Helicone/helicone

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

Operates as a reverse proxy AI gateway that intercepts requests to 100+ LLM providers through a unified OpenAI-compatible API, enabling intelligent routing and automatic fallbacks. Built on a microservices architecture with a Cloudflare Workers proxy layer for request interception, Express-based collection server (Jawn), ClickHouse for analytics, and Supabase for application data. Integrates with OpenAI, Anthropic, Gemini, LangChain, Vercel AI SDK, and supports self-hosting via Docker or Helm with optional async logging through OpenLLMetry.

Related comparisons

phoenix and langfuse phoenix and agenta phoenix and langtrace phoenix and langwatch phoenix and brokle phoenix and openinspector

Scores updated daily from GitHub, PyPI, and npm data. How scores work