LLM Observability Platforms Prompt Engineering Tools

Tools for monitoring, tracing, evaluating, and debugging LLM applications in production. Includes end-to-end observability, real-time metrics, automated evals, and prompt management dashboards. Does NOT include general application monitoring, synthetic data generation, or agent training frameworks.

There are 27 llm observability platforms tools tracked. 5 score above 70 (verified tier). The highest-rated is langfuse/langfuse at 95/100 with 23,106 stars and 3,912,905 monthly downloads. 6 of the top 10 are actively maintained.

Get all 27 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=llm-observability-platforms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals,...

95
Verified
2 Arize-ai/phoenix

AI Observability & Evaluation

94
Verified
3 Mirascope/mirascope

The LLM Anti-Framework

87
Verified
4 Helicone/helicone

🧊 Open source LLM observability platform. One line of code to monitor,...

81
Verified
5 Agenta-AI/agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM...

72
Verified
6 algorithmicsuperintelligence/optillm

Optimizing inference proxy for LLMs

62
Established
7 TensorOpsAI/LLMstudio

Framework to bring LLM applications to production

60
Established
8 Scale3-Labs/langtrace

Langtrace 🔍 is an open-source, Open Telemetry based end-to-end...

44
Emerging
9 langfuse/langfuse-java

🪢 Auto-generated Java Client for Langfuse API

42
Emerging
10 AnchoringAI/anchoring-ai

An open-source no-code tool for teams to collaborate on building,...

39
Emerging
11 tenemos/langwatch

The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and...

36
Emerging
12 whylabs/langkit

🔍 LangKit: An open-source toolkit for monitoring Large Language Models...

36
Emerging
13 TrentPierce/PolyCouncil

PolyCouncil is an open-source multi-model deliberation engine for LM Studio....

36
Emerging
14 brokle-ai/brokle

The AI engineering platform for AI teams. Observability, evaluation, and...

35
Emerging
15 alpha-one-index/ai-llmops-index

Comprehensive LLMOps reference index: observability platforms, inference...

23
Experimental
16 as32608/openinspector

A lightweight, local-first observability proxy and dashboard designed to...

22
Experimental
17 alebgn1/ai-llmops-index

Provide a comprehensive, regularly updated index of AI LLM providers,...

22
Experimental
18 chirindaopensource/multi_agent_system_architecture_for_federal_funds_target_rate_prediction

End-to-End Python implementation of "FedSight AI" multi-agent system for...

18
Experimental
19 ksm26/Evaluating-AI-Agents

A hands-on course repository for Evaluating AI Agents, created with Arize...

16
Experimental
20 Uplay111/Loki-s-Insight-

A lightweight visual dashboard to inspect and edit OpenClaw AI agent memory...

15
Experimental
21 vshwsh/prod-evals-cookbook

🎯 Build effective AI evaluations through a hands-on tutorial, using a...

14
Experimental
22 Tarunjit45/ModelPulse

ModelPulse helps maintain model reliability and performance by providing...

12
Experimental
23 MagicTeaMC/dnsLM

dnsLM: Where AI meets DNS—because even domains deserve a little intelligence!

12
Experimental
24 VicRejkia/LLM-Sherpa

A Python GUI tool to package a codebase into a single, context-rich Markdown...

11
Experimental
25 alhemdrew/self-hosted-llm-infrastructure

Deployment of a self-hosted LLM infrastructure using Ollama and Open WebUI...

11
Experimental
26 marco-ruiz/llm-repo

Framework that translates LLM responses to structured data models

11
Experimental
27 rahatmoktadir03/llm-evaluation-platform

A full-stack web application for comparing and analyzing the performance of...

10
Experimental