SAILResearch/awesome-foundation-model-leaderboards

A curated list of awesome leaderboard-oriented resources for AI domain

/ 100

Emerging

Covers domain-specific foundation model leaderboards across 15+ modalities (text, code, image, video, agent, medical, etc.) alongside evaluation infrastructure, datasets, and benchmarking tools. Includes a curated taxonomy of both leaderboard platforms (Hugging Face, Kaggle, AIcrowd) and operational tooling (backend templates, scrapers, submission handlers) based on empirical analysis of leaderboard workflows and anti-patterns. Provides integrated search capabilities and focuses exclusively on actively maintained, AI-domain leaderboards to help practitioners benchmark and deploy models.

321 stars.

No License No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 17 / 25

How are scores calculated?

Stars

321

Forks

Language

—

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

StonyBrookNLP/appworld

🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and...

qualifire-dev/rogue

AI Agent Evaluator & Red Team Platform

future-agi/ai-evaluation

Evaluation Framework for all your AI related Workflows

microsoft/WindowsAgentArena

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of...

agentscope-ai/OpenJudge

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

Explore AI Agents

All categories Trending AI Agent directory Insights