modelscope/MCPBench

The evaluation benchmark on MCP servers

32
/ 100
Emerging

Evaluates MCP servers across Web Search, Database Query, and GAIA tasks by measuring task completion accuracy, latency, and token consumption under consistent LLM/Agent configurations. Supports both local stdio-based servers (launched via npx) and remote SSE-connected servers, with automatic tool detection eliminating manual configuration. Includes curated datasets (600 WebSearch QA pairs, database query benchmarks) and provides standardized evaluation scripts for comparative analysis of implementations like Brave Search and DuckDuckGo.

241 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 9 / 25
Community 11 / 25

How are scores calculated?

Stars

241

Forks

15

Language

Python

License

Apache-2.0

Last pushed

Sep 03, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mcp/modelscope/MCPBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.