ML Benchmarking Frameworks
Tools and frameworks for reproducibly benchmarking, evaluating, and comparing machine learning models across different domains and datasets. Does NOT include domain-specific prediction tasks, competition leaderboards, or educational coursework collections.
There are 44 ml benchmarking frameworks tracked. 3 score above 70 (verified tier). The highest-rated is opentensor/bittensor at 86/100 with 1,383 stars and 107,641 monthly downloads. 2 of the top 10 are actively maintained.
Get all 44 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-benchmarking-frameworks&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
opentensor/bittensor
Internet-scale Neural Networks |
|
Verified |
| 2 |
trailofbits/fickling
A Python pickling decompiler and static analyzer |
|
Verified |
| 3 |
benchopt/benchopt
A framework for reproducible, comparable benchmarks |
|
Verified |
| 4 |
BiomedSciAI/fuse-med-ml
A python framework accelerating ML based discovery in the medical field by... |
|
Established |
| 5 |
taoshidev/vanta-network
Vanta Network built on Bittensor |
|
Established |
| 6 |
mosaicml/streaming
A Data Streaming Library for Efficient Neural Network Training |
|
Established |
| 7 |
breuner/elbencho
A distributed storage benchmark for file systems, object stores & block... |
|
Emerging |
| 8 |
google-research/zapbench
The Zebrafish Activity Prediction Benchmark measures progress on the problem... |
|
Emerging |
| 9 |
tensorflow/model-card-toolkit
A toolkit that streamlines and automates the generation of model cards |
|
Emerging |
| 10 |
KevinMusgrave/powerful-benchmarker
A library for ML benchmarking. It's powerful. |
|
Emerging |
| 11 |
SDNNetSim/FUSION
FUSION is an open-source project aimed at revolutionizing networking through... |
|
Emerging |
| 12 |
aai-institute/nnbench
A small framework for benchmarking machine learning models. |
|
Emerging |
| 13 |
mariusbrataas/flowpoints_ml
An intuitive approach to creating deep learning models |
|
Emerging |
| 14 |
HanBnrd/BenchNIRS
Benchmarking framework for machine learning with fNIRS |
|
Emerging |
| 15 |
heilcheng/openevals
Benchmarking suite for open-weight language models |
|
Emerging |
| 16 |
CryAndRRich/dataflow
Decoding customer behaviors via Hybrid Neural-ML frameworks (3rd place of... |
|
Emerging |
| 17 |
rllm-team/tlsql
Table Learning Structured Query Language |
|
Emerging |
| 18 |
scott-huberty/amica-python
Python Implementation of Adaptive Mixture ICA |
|
Emerging |
| 19 |
SafeRL-Lab/BenchNetRL
🔥Benchmarking of Neural Network Architectures in Reinforcement Learning. |
|
Emerging |
| 20 |
google-research/rliable
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML... |
|
Emerging |
| 21 |
florencejt/fusilli
A Python package housing a collection of deep-learning multi-modal data... |
|
Emerging |
| 22 |
modelflows/ModelFLOWs-app
ModelFLOWs application |
|
Emerging |
| 23 |
data-centric-ai/dcbench
A benchmark of data-centric tasks from across the machine learning lifecycle. |
|
Emerging |
| 24 |
opentensor/validators
Repository for bittensor validators |
|
Emerging |
| 25 |
DACUS1995/pytorch-mmap-dataset
A custom pytorch Dataset extension that provides a faster iteration and... |
|
Emerging |
| 26 |
IvanIZ/BenchPush
BenchPush is a comprehensive benchmarking suite designed for mobile robots... |
|
Emerging |
| 27 |
tcbenchstack/tcbench
tcbench is a Machine Learning and Deep Learning framework to train model... |
|
Experimental |
| 28 |
Jahid-Hasan1/Py-Fusion
🐍PyFusion🐍 is an open-source Python project designed to seamlessly integrate... |
|
Experimental |
| 29 |
kolesole/PredQL
PredQL is a Python framework for task generation in Relational Deep... |
|
Experimental |
| 30 |
neuroprismlab/PRISME-Brain-Power-Calculator
PRISME Power Calculator |
|
Experimental |
| 31 |
Kushalk0677/Inference-Energy-and-Latency-in-AI-Mediated-Education-Green-Audit
Empirical study of inference energy, latency, and pedagogical quality for... |
|
Experimental |
| 32 |
huggingface/hf_benchmarks
A starter kit for evaluating benchmarks on the 🤗 Hub |
|
Experimental |
| 33 |
TorchQL/torchql
TorchQL is a query language for Python-based machine learning models and datasets. |
|
Experimental |
| 34 |
ha-196120/swiftembed-benchmarks
🚀 Evaluate SwiftEmbed's performance with benchmarking scripts for ultra-fast... |
|
Experimental |
| 35 |
Alwx83383838/RuQualBench
🐸 Evaluate Russian language quality in LLMs by measuring typical errors... |
|
Experimental |
| 36 |
michael-borck/loco-bench
Systematic benchmarks of quantized small language models on consumer hardware |
|
Experimental |
| 37 |
nprint/benchmarks
A central repository to track the progress of network traffic analysis |
|
Experimental |
| 38 |
yuliu625/Yu-Deep-Learning-Toolkit
A versatile deep learning toolkit providing reusable components for common... |
|
Experimental |
| 39 |
Jon-Sina/Benchmark_Embedding_Models
🔍 Benchmark embedding models by creating custom datasets to evaluate and... |
|
Experimental |
| 40 |
edlansiaux/swiftembed-benchmarks
Repository of benchmarking scripts for the SwiftEmbed embedding system, a... |
|
Experimental |
| 41 |
lkopf/prism
[NeurIPS 2025] PRISM is a multi-concept feature description framework which... |
|
Experimental |
| 42 |
helkaroui/RapidFlow
RapidFlow is a straightforward tool for bringing machine learning models... |
|
Experimental |
| 43 |
Sahilrajveer/reasonbench
📊 Evaluate machine learning models with realistic benchmarks that offer a... |
|
Experimental |
| 44 |
facu18911891/python-cqb
⚙️ Simplify the design and management of complex queries in Python with this... |
|
Experimental |