tongye98/Awesome-Code-Benchmark
A comprehensive code domain benchmark review of LLM researches.
Curates and tracks emerging code-specific benchmarks across diverse evaluation dimensions—from code generation and security analysis to repository-level reasoning and multi-turn interactions. Aggregates peer-reviewed research benchmarks with structured categorization by task type, capability tested, and source institution, enabling systematic comparison of LLM performance across hundreds of specialized evaluation datasets. Actively maintains featured benchmark listings covering recent advances in performance optimization, code translation efficiency, agent-based task solving, and multi-modal code understanding.
208 stars. No commits in the last 6 months.
Stars
208
Forks
16
Language
—
License
MIT
Category
Last pushed
Sep 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/tongye98/Awesome-Code-Benchmark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
k4black/codebleu
Pip compatible CodeBLEU metric implementation available for linux/macos/win
LiveCodeBench/LiveCodeBench
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of...
EdinburghNLP/code-docstring-corpus
Preprocessed Python functions and docstrings for automated code documentation (code2doc) and...
hendrycks/apps
APPS: Automated Programming Progress Standard (NeurIPS 2021)
alxschwrz/codex_py2cpp
Converts python code into c++ by using OpenAI CODEX.