LLM Observability Platforms Prompt Engineering Tools
Tools for monitoring, tracing, evaluating, and debugging LLM applications in production. Includes end-to-end observability, real-time metrics, automated evals, and prompt management dashboards. Does NOT include general application monitoring, synthetic data generation, or agent training frameworks.
There are 27 llm observability platforms tools tracked. 5 score above 70 (verified tier). The highest-rated is langfuse/langfuse at 95/100 with 23,106 stars and 3,912,905 monthly downloads. 6 of the top 10 are actively maintained.
Get all 27 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=llm-observability-platforms&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
langfuse/langfuse
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals,... |
|
Verified |
| 2 |
Arize-ai/phoenix
AI Observability & Evaluation |
|
Verified |
| 3 |
Mirascope/mirascope
The LLM Anti-Framework |
|
Verified |
| 4 |
Helicone/helicone
🧊 Open source LLM observability platform. One line of code to monitor,... |
|
Verified |
| 5 |
Agenta-AI/agenta
The open-source LLMOps platform: prompt playground, prompt management, LLM... |
|
Verified |
| 6 |
algorithmicsuperintelligence/optillm
Optimizing inference proxy for LLMs |
|
Established |
| 7 |
TensorOpsAI/LLMstudio
Framework to bring LLM applications to production |
|
Established |
| 8 |
Scale3-Labs/langtrace
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end... |
|
Emerging |
| 9 |
langfuse/langfuse-java
🪢 Auto-generated Java Client for Langfuse API |
|
Emerging |
| 10 |
AnchoringAI/anchoring-ai
An open-source no-code tool for teams to collaborate on building,... |
|
Emerging |
| 11 |
tenemos/langwatch
The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and... |
|
Emerging |
| 12 |
whylabs/langkit
🔍 LangKit: An open-source toolkit for monitoring Large Language Models... |
|
Emerging |
| 13 |
TrentPierce/PolyCouncil
PolyCouncil is an open-source multi-model deliberation engine for LM Studio.... |
|
Emerging |
| 14 |
brokle-ai/brokle
The AI engineering platform for AI teams. Observability, evaluation, and... |
|
Emerging |
| 15 |
alpha-one-index/ai-llmops-index
Comprehensive LLMOps reference index: observability platforms, inference... |
|
Experimental |
| 16 |
as32608/openinspector
A lightweight, local-first observability proxy and dashboard designed to... |
|
Experimental |
| 17 |
alebgn1/ai-llmops-index
Provide a comprehensive, regularly updated index of AI LLM providers,... |
|
Experimental |
| 18 |
chirindaopensource/multi_agent_system_architecture_for_federal_funds_target_rate_prediction
End-to-End Python implementation of "FedSight AI" multi-agent system for... |
|
Experimental |
| 19 |
ksm26/Evaluating-AI-Agents
A hands-on course repository for Evaluating AI Agents, created with Arize... |
|
Experimental |
| 20 |
Uplay111/Loki-s-Insight-
A lightweight visual dashboard to inspect and edit OpenClaw AI agent memory... |
|
Experimental |
| 21 |
vshwsh/prod-evals-cookbook
🎯 Build effective AI evaluations through a hands-on tutorial, using a... |
|
Experimental |
| 22 |
Tarunjit45/ModelPulse
ModelPulse helps maintain model reliability and performance by providing... |
|
Experimental |
| 23 |
MagicTeaMC/dnsLM
dnsLM: Where AI meets DNS—because even domains deserve a little intelligence! |
|
Experimental |
| 24 |
VicRejkia/LLM-Sherpa
A Python GUI tool to package a codebase into a single, context-rich Markdown... |
|
Experimental |
| 25 |
alhemdrew/self-hosted-llm-infrastructure
Deployment of a self-hosted LLM infrastructure using Ollama and Open WebUI... |
|
Experimental |
| 26 |
marco-ruiz/llm-repo
Framework that translates LLM responses to structured data models |
|
Experimental |
| 27 |
rahatmoktadir03/llm-evaluation-platform
A full-stack web application for comparing and analyzing the performance of... |
|
Experimental |