oripress/AlgoTune

AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each problem, and is faster than existing implementations.

/ 100

Established

Includes AlgoTuner, an agentic framework enabling language models to iteratively optimize code through automated compilation and benchmarking. Supports distributed execution via SLURM, AWS Batch with Spot instance optimization, and offline evaluation—datasets stream from HuggingFace or can be generated locally for reproducible benchmark runs.

No Package No Dependents

Maintenance 13 / 25

Adoption 9 / 25

Maturity 15 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Related models

xjywhu/Awesome-Multimodal-LLM-for-Code

Multimodal Large Language Models for Code Generation under Multimodal Scenarios

juyongjiang/CodeUp

CodeUp: A Multilingual Code Generation Llama-X Model with Parameter-Efficient Instruction-Tuning

Gen-Verse/ReasonFlux

[NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and...

jie-jw-wu/human-eval-comm

HumanEvalComm: Evaluating Communication Skill of Code LLM and LLM Agent

amazon-science/llm-code-preference

Training and Benchmarking LLMs for Code Preference.

Explore Transformer Models

All categories Trending Transformer directory Insights