langfuse and langkit

These are complements that work together: LangKit extracts monitoring signals (text quality, safety metrics) from LLM inputs/outputs that Langfuse can ingest and visualize within its broader observability platform.

langfuse

Verified

langkit

Emerging

Maintenance 25/25

Adoption 25/25

Maturity 25/25

Community 20/25

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 17/25

Stars: 23,106

Forks: 2,333

Downloads: 3,912,905

Commits (30d): 240

Language: TypeScript

License: —

Stars: 976

Forks: 70

Downloads: —

Commits (30d): 0

Language: Jupyter Notebook

License: Apache-2.0

No risk flags

Stale 6m No Package No Dependents

About langfuse

langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

Provides distributed tracing via SDKs (Python, JavaScript/TypeScript) that capture full LLM call chains with automatic context propagation, backed by ClickHouse for scalable analytics. Features a unified API surface for programmatic access to traces, evaluations, and datasets, enabling custom workflows and integration into existing MLOps pipelines alongside LangChain, LlamaIndex, and other frameworks.

About langkit

whylabs/langkit

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & security. 🛡️ Features include text quality, relevance metrics, & sentiment analysis. 📊 A comprehensive tool for LLM observability. 👀

Extracts specialized threat signals including jailbreak attempts, prompt injection attacks, hallucination detection, and refusal patterns alongside standard quality metrics. Built as a modular UDF layer that integrates directly with whylogs' schema system, enabling composable metric pipelines with configurable performance trade-offs (throughput ranges from 2K+ chats/sec with light metrics to sub-1 chat/sec with full analysis). Designed for production LLM monitoring workflows, with outputs visualizable in the WhyLabs observability platform or analyzed independently.

Related comparisons

langfuse and phoenix langfuse and helicone langfuse and agenta langfuse and LLMstudio langfuse and langtrace langfuse and langfuse-java

Scores updated daily from GitHub, PyPI, and npm data. How scores work