ragrank and llm-eval

ragrank

Established

llm-eval

Emerging

Maintenance 10/25

Adoption 8/25

Maturity 16/25

Community 18/25

Maintenance 2/25

Adoption 9/25

Maturity 15/25

Community 19/25

Stars: 45

Forks: 14

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 82

Forks: 18

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No Package No Dependents

Stale 6m No Package No Dependents

About ragrank

izam-mohammed/ragrank

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

This toolkit helps you assess the performance of your Retrieval-Augmented Generation (RAG) applications. You provide your RAG model's questions, the contexts it retrieves, and its generated responses, and it gives you metrics on factual accuracy, context understanding, and tone. This is for AI/ML engineers, data scientists, or product managers who build and deploy LLM applications and need to ensure their RAG systems are delivering high-quality, reliable outputs.

LLM application development RAG system evaluation AI model quality assurance Natural Language Processing Generative AI

About llm-eval

justplus/llm-eval

大语言模型评估平台，支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。

This platform helps AI product managers and researchers quickly evaluate the performance of large language models (LLMs). You can upload your own datasets (like Q&A pairs, multiple-choice questions, or RAG data) and it outputs detailed reports on model accuracy, latency, and throughput. It's designed for anyone needing to compare, test, and optimize LLMs for specific applications.

AI-evaluation LLM-benchmarking NLP-testing model-comparison RAG-assessment

Related comparisons

ragrank and evalscope ragrank and llm-evaluation ragrank and llm-eval-bench ragrank and RagaliQ ragrank and evalscope

Scores updated daily from GitHub, PyPI, and npm data. How scores work