huggingface/evaluation-guidebook
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
2,075 stars.
Stars
2,075
Forks
121
Language
Jupyter Notebook
License
—
Category
Last pushed
Dec 03, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/huggingface/evaluation-guidebook"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
EuroEval/EuroEval
The robust European language model benchmark.