mcp-tool-bench/MCPToolBenchPP

MCPToolBench++ MCP Model Context Protocol Tool Use Benchmark on AI Agent and Model Tool Use Ability

37
/ 100
Emerging

Comprehensive benchmark for evaluating LLM tool-use capabilities across 45+ MCP server categories (browser automation, file systems, search, maps, payments, finance) with 4k+ instances covering single and multi-step tool calls. Evaluation uses standardized metrics (AST and Pass@K) with an LLM-as-judge approach, supporting major models like GPT-4o, Qwen, and Claude across multilingual scenarios. Integrates with MCP ecosystem servers and the OneKey MCP Router for simplified API access to commercial services like Google Maps and Perplexity.

No License No Package No Dependents
Maintenance 6 / 25
Adoption 7 / 25
Maturity 7 / 25
Community 17 / 25

How are scores calculated?

Stars

41

Forks

8

Language

Python

License

Last pushed

Dec 17, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mcp/mcp-tool-bench/MCPToolBenchPP"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.