relari-ai/continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

/ 100

Emerging

""" pii_check = CustomMetric( name="pii_check", criteria=criteria, rubric=rubric, metric_type="discrete", # can be 'discrete' or 'continuous' ) result = pii_check(answer="My name is John.") print(result) ``` ## Features - Modularized evaluation (evaluate each pipeline module with tailored metrics) - Metric library with deterministic, semantic, and LLM-based metrics - Support for probabilistic evaluation - Isolation of Pipeline components - Support for custom metrics and tests - Distributed evaluation (using Ray) - Integration with OpenAI and other LLM providers - All major frameworks (LangChain, LlamaIndex, Ollama, VertexAI, etc.) - Comprehensive documentation with examples ##

516 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

516

Forks

Language

Python

License

Apache-2.0

Compare

continuous-eval and evalscope

Higher-rated alternatives

modelscope/evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation...

Kareem-Rashed/rubric-eval

Independent framework to test, benchmark, and evaluate LLMs & AI agents locally.

izam-mohammed/ragrank

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it...

justplus/llm-eval

大语言模型评估平台，支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。

Addepto/contextcheck

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable...

Explore RAG Tools

All categories Trending RAG directory Insights