modelscope/evalscope
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Supports pluggable backend evaluation engines (OpenCompass, VLMEvalKit, RAGAS, MTEB) and integrates multi-modal benchmarks across LLMs, VLMs, embedding models, and code tasks through a registry-based architecture. Features performance profiling with latency metrics (TTFT, TPOT), SLA auto-tuning for service concurrency limits, and interactive WebUI dashboards powered by Gradio/Wandb for comparative analysis and arena-style model battles.
2,501 stars and 29,097 monthly downloads. Used by 1 other package. Actively maintained with 36 commits in the last 30 days. Available on PyPI.
Stars
2,501
Forks
285
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 11, 2026
Monthly downloads
29,097
Commits (30d)
36
Dependencies
38
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/modelscope/evalscope"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
Kareem-Rashed/rubric-eval
Independent framework to test, benchmark, and evaluate LLMs & AI agents locally.
izam-mohammed/ragrank
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it...
justplus/llm-eval
大语言模型评估平台,支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。
dokimos-dev/dokimos
Evaluation Framework for LLM applications in Java and Kotlin
cleanlab/tlm
Score the trustworthiness of outputs from any LLM in real-time