Llm Evaluation Benchmarking LLM Tools

There are 3 llm evaluation benchmarking tools tracked. 1 score above 50 (established tier). The highest-rated is jeinlee1991/chinese-llm-benchmark at 55/100 with 5,675 stars. 1 of the top 10 are actively maintained.

Get all 3 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-evaluation-benchmarking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 jeinlee1991/chinese-llm-benchmark

ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pr...

55
Established
2 bvobart/mllint

`mllint` is a command-line utility to evaluate the technical quality of...

42
Emerging
3 Software-Engineering-Arena/SWE-Chatbot-Arena

Compare chatbots pairwise via multi‑round evaluations for SE tasks.

16
Experimental