evalscope and llm-eval

These are competitors in the LLM/RAG evaluation space, as both provide customizable evaluation frameworks with support for multiple benchmarks and RAG assessment, though evalscope offers broader model type coverage (LLM, VLM, AIGC) while llm-eval is more specialized for language models.

evalscope

Verified

llm-eval

Emerging

Maintenance 23/25

Adoption 21/25

Maturity 25/25

Community 21/25

Maintenance 2/25

Adoption 9/25

Maturity 15/25

Community 19/25

Stars: 2,501

Forks: 285

Downloads: 29,097

Commits (30d): 36

Language: Python

License: Apache-2.0

Stars: 82

Forks: 18

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No risk flags

Stale 6m No Package No Dependents

About evalscope

modelscope/evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

Supports pluggable backend evaluation engines (OpenCompass, VLMEvalKit, RAGAS, MTEB) and integrates multi-modal benchmarks across LLMs, VLMs, embedding models, and code tasks through a registry-based architecture. Features performance profiling with latency metrics (TTFT, TPOT), SLA auto-tuning for service concurrency limits, and interactive WebUI dashboards powered by Gradio/Wandb for comparative analysis and arena-style model battles.

About llm-eval

justplus/llm-eval

大语言模型评估平台，支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。

Provides LLM-agnostic evaluation across multiple task formats (QA, MCQ, RAG) with built-in LLM-as-a-judge scoring using Ragas framework for RAG pipelines, and customizable Jinja2 templates for domain-specific metrics. Includes concurrent performance stress-testing with latency/throughput analysis, multi-model management via unified API configuration, and result export capabilities, all through a web UI built with DaisyUI supporting real-time task status updates.

Related comparisons

evalscope and ragrank evalscope and continuous-eval evalscope and llm-eval-bench

Scores updated daily from GitHub, PyPI, and npm data. How scores work