evalscope and ragrank

These are complements: evalscope provides a general-purpose LLM evaluation framework while ragrank specializes in RAG-specific metrics (factual accuracy, context understanding, tone), allowing them to be used together for comprehensive RAG system evaluation.

evalscope

Verified

ragrank

Established

Maintenance 23/25

Adoption 21/25

Maturity 25/25

Community 21/25

Maintenance 10/25

Adoption 8/25

Maturity 16/25

Community 18/25

Stars: 2,501

Forks: 285

Downloads: 29,097

Commits (30d): 36

Language: Python

License: Apache-2.0

Stars: 45

Forks: 14

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No risk flags

No Package No Dependents

About evalscope

modelscope/evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

Supports pluggable backend evaluation engines (OpenCompass, VLMEvalKit, RAGAS, MTEB) and integrates multi-modal benchmarks across LLMs, VLMs, embedding models, and code tasks through a registry-based architecture. Features performance profiling with latency metrics (TTFT, TPOT), SLA auto-tuning for service concurrency limits, and interactive WebUI dashboards powered by Gradio/Wandb for comparative analysis and arena-style model battles.

About ragrank

izam-mohammed/ragrank

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

Specialized for RAG pipeline evaluation with metrics like response relevancy, context understanding, and factual accuracy. Built as a Python toolkit that integrates with OpenAI's API by default but supports custom LLM models, enabling flexible assessment workflows through a dataset-to-metrics evaluation pattern. Provides structured evaluation results exportable to dataframes for analysis and integration with downstream data processing pipelines.

Related comparisons

evalscope and llm-eval evalscope and continuous-eval evalscope and llm-eval-bench

Scores updated daily from GitHub, PyPI, and npm data. How scores work