onejune2018/Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.

/ 100

Emerging

Organizes evaluation resources around an **anthropomorphic and value-oriented taxonomy** that extends beyond traditional benchmarks to assess reasoning robustness, uncertainty quantification, and long-context capabilities. Integrates references to specialized evaluation tools (OpenCompass, DeepEval, AlpacaEval), domain-specific benchmarks (RAG, agents, coding, multimodal), and LLMOps frameworks, while maintaining structured categorization of pre-trained, instruction-tuned, and aligned models. Backed by a peer-reviewed survey paper that provides the methodological foundation for its continuously updated categorization scheme.

616 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

616

Forks

Language

—

License

MIT

Higher-rated alternatives

SepineTam/stata-mcp

Let LLM help you achieve your regression with Stata. Evolve from reg monkey to causal thinker.

datawhalechina/code-your-own-llm

一份全栈式大语言模型参考指南，用最简洁的代码帮助你端到端定义模型从零训练到工程落地的每一个细节

leonid20000/odin-slides

This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint...

R3gm/InsightSolver-Colab

InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning,...

sinanuozdemir/quick-start-guide-to-llms

The Official Repo for "Quick Start Guide to Large Language Models"

Explore LLM Tools

All categories Trending LLM Tool directory Insights