onejune2018/Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
Organizes evaluation resources around an **anthropomorphic and value-oriented taxonomy** that extends beyond traditional benchmarks to assess reasoning robustness, uncertainty quantification, and long-context capabilities. Integrates references to specialized evaluation tools (OpenCompass, DeepEval, AlpacaEval), domain-specific benchmarks (RAG, agents, coding, multimodal), and LLMOps frameworks, while maintaining structured categorization of pre-trained, instruction-tuned, and aligned models. Backed by a peer-reviewed survey paper that provides the methodological foundation for its continuously updated categorization scheme.
616 stars.
Stars
616
Forks
51
Language
—
License
MIT
Category
Last pushed
Nov 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/onejune2018/Awesome-LLM-Eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
SepineTam/stata-mcp
Let LLM help you achieve your regression with Stata. Evolve from reg monkey to causal thinker.
datawhalechina/code-your-own-llm
一份全栈式大语言模型参考指南,用最简洁的代码帮助你端到端定义模型从零训练到工程落地的每一个细节
leonid20000/odin-slides
This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint...
R3gm/InsightSolver-Colab
InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning,...
sinanuozdemir/quick-start-guide-to-llms
The Official Repo for "Quick Start Guide to Large Language Models"