Math Reasoning Datasets Transformer Models
There are 37 math reasoning datasets models tracked. 1 score above 70 (verified tier). The highest-rated is ExtensityAI/symbolicai at 75/100 with 1,677 stars and 2,722 monthly downloads. 1 of the top 10 are actively maintained.
Get all 37 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=math-reasoning-datasets&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
ExtensityAI/symbolicai
A neurosymbolic perspective on LLMs |
|
Verified |
| 2 |
TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task... |
|
Emerging |
| 3 |
deep-symbolic-mathematics/LLM-SR
[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on... |
|
Emerging |
| 4 |
zhudotexe/fanoutqa
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering... |
|
Emerging |
| 5 |
microsoft/interwhen
A framework for verifiable reasoning with language models. |
|
Emerging |
| 6 |
HiThink-Research/MME-Finance
[MM 2025] A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning |
|
Emerging |
| 7 |
xlang-ai/Binder
[ICLR 2023] Code for the paper "Binding Language Models in Symbolic Languages" |
|
Emerging |
| 8 |
yifanzhang-pro/AutoMathText
[ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative... |
|
Emerging |
| 9 |
princeton-pli/AdaptMI
[COLM 2025] Adaptive Skill-based In-context Math Instruction for Small... |
|
Emerging |
| 10 |
SeekingDream/DyCodeEval
Official repository of the ICML2025 paper “Dynamic Benchmarking of Reasoning... |
|
Emerging |
| 11 |
TIGER-AI-Lab/StructLM
Code and data for "StructLM: Towards Building Generalist Models for... |
|
Emerging |
| 12 |
AlphaPav/mem-kk-logic
On Memorization of Large Language Models in Logical Reasoning |
|
Emerging |
| 13 |
DAMO-NLP-SG/LLM-Multilingual-Knowledge-Boundaries
[ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across... |
|
Emerging |
| 14 |
TIGER-AI-Lab/LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning"... |
|
Experimental |
| 15 |
declare-lab/LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal... |
|
Experimental |
| 16 |
TIGER-AI-Lab/MAmmoTH
Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid... |
|
Experimental |
| 17 |
akjindal53244/Arithmo
Small and Efficient Mathematical Reasoning LLMs |
|
Experimental |
| 18 |
amazon-science/recode
Releasing code for "ReCode: Robustness Evaluation of Code Generation Models" |
|
Experimental |
| 19 |
google/curie
Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long... |
|
Experimental |
| 20 |
martin-wey/CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025) |
|
Experimental |
| 21 |
QwenLM/PolyMath
[NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath:... |
|
Experimental |
| 22 |
bobxwu/learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and... |
|
Experimental |
| 23 |
ryokamoi/llm-self-correction-papers
List of papers on Self-Correction of LLMs. |
|
Experimental |
| 24 |
reasoning-machines/CoCoGen
Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022) |
|
Experimental |
| 25 |
conditionWang/FLNK
Federated Learning with New Knowledge -- explore to incorporate various new... |
|
Experimental |
| 26 |
gersteinlab/Struc-Bench
[NAACL 2024] Struc-Bench: Are Large Language Models Good at Generating... |
|
Experimental |
| 27 |
zjunlp/DynamicKnowledgeCircuits
[ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits... |
|
Experimental |
| 28 |
kaistAI/LangBridge
[ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision |
|
Experimental |
| 29 |
YangLing0818/SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought... |
|
Experimental |
| 30 |
WooooDyy/MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique... |
|
Experimental |
| 31 |
merlerm/In-Context-Symbolic-Regression
Official code implementation for the ACL 2024 Student Research Workshop... |
|
Experimental |
| 32 |
joeljang/continual-knowledge-learning
[ICLR 2022] Towards Continual Knowledge Learning of Language Models |
|
Experimental |
| 33 |
UCSC-VLAA/vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in... |
|
Experimental |
| 34 |
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on... |
|
Experimental |
| 35 |
iiis-ai/IterativeQuestionComposing
[AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing... |
|
Experimental |
| 36 |
TIGER-AI-Lab/TableCoT
The code and data for paper "Large Language Models are few(1)-shot Table... |
|
Experimental |
| 37 |
Eleanor-H/MUSTARD
Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform... |
|
Experimental |