ML Benchmarking Frameworks

Tools and frameworks for reproducibly benchmarking, evaluating, and comparing machine learning models across different domains and datasets. Does NOT include domain-specific prediction tasks, competition leaderboards, or educational coursework collections.

There are 44 ml benchmarking frameworks tracked. 3 score above 70 (verified tier). The highest-rated is opentensor/bittensor at 86/100 with 1,383 stars and 107,641 monthly downloads. 2 of the top 10 are actively maintained.

Get all 44 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-benchmarking-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	opentensor/bittensor Internet-scale Neural Networks	86	Verified	1,383	Python
2	trailofbits/fickling A Python pickling decompiler and static analyzer	78	Verified	609	Python
3	benchopt/benchopt A framework for reproducible, comparable benchmarks	72	Verified	294	Python
4	BiomedSciAI/fuse-med-ml A python framework accelerating ML based discovery in the medical field by...	62	Established	154	Python
5	taoshidev/vanta-network Vanta Network built on Bittensor	51	Established	71	Python
6	mosaicml/streaming A Data Streaming Library for Efficient Neural Network Training	50	Established	1,472	Python
7	breuner/elbencho A distributed storage benchmark for file systems, object stores & block...	49	Emerging	256	C++
8	google-research/zapbench The Zebrafish Activity Prediction Benchmark measures progress on the problem...	45	Emerging	67	Python
9	tensorflow/model-card-toolkit A toolkit that streamlines and automates the generation of model cards	43	Emerging	444	Python
10	KevinMusgrave/powerful-benchmarker A library for ML benchmarking. It's powerful.	41	Emerging	439	Jupyter Notebook
11	SDNNetSim/FUSION FUSION is an open-source project aimed at revolutionizing networking through...	41	Emerging	13	Python
12	aai-institute/nnbench A small framework for benchmarking machine learning models.	41	Emerging	21	Python
13	mariusbrataas/flowpoints_ml An intuitive approach to creating deep learning models	39	Emerging	372	JavaScript
14	HanBnrd/BenchNIRS Benchmarking framework for machine learning with fNIRS	39	Emerging	6	Python
15	heilcheng/openevals Benchmarking suite for open-weight language models	38	Emerging	133	Python
16	CryAndRRich/dataflow Decoding customer behaviors via Hybrid Neural-ML frameworks (3rd place of...	36	Emerging	1	Python
17	rllm-team/tlsql Table Learning Structured Query Language	35	Emerging	5	Python
18	scott-huberty/amica-python Python Implementation of Adaptive Mixture ICA	35	Emerging	1	Python
19	SafeRL-Lab/BenchNetRL 🔥Benchmarking of Neural Network Architectures in Reinforcement Learning.	34	Emerging	34	Python
20	google-research/rliable [NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML...	34	Emerging	866	Jupyter Notebook
21	florencejt/fusilli A Python package housing a collection of deep-learning multi-modal data...	33	Emerging	198	Python
22	modelflows/ModelFLOWs-app ModelFLOWs application	32	Emerging	22	Python
23	data-centric-ai/dcbench A benchmark of data-centric tasks from across the machine learning lifecycle.	31	Emerging	71	Jupyter Notebook
24	opentensor/validators Repository for bittensor validators	31	Emerging	16	Python
25	DACUS1995/pytorch-mmap-dataset A custom pytorch Dataset extension that provides a faster iteration and...	31	Emerging	46	Python
26	IvanIZ/BenchPush BenchPush is a comprehensive benchmarking suite designed for mobile robots...	30	Emerging	18	Python
27	tcbenchstack/tcbench tcbench is a Machine Learning and Deep Learning framework to train model...	29	Experimental	32	Jupyter Notebook
28	Jahid-Hasan1/Py-Fusion 🐍PyFusion🐍 is an open-source Python project designed to seamlessly integrate...	27	Experimental	8	Python
29	kolesole/PredQL PredQL is a Python framework for task generation in Relational Deep...	26	Experimental	5	Python
30	neuroprismlab/PRISME-Brain-Power-Calculator PRISME Power Calculator	26	Experimental	4	MATLAB
31	Kushalk0677/Inference-Energy-and-Latency-in-AI-Mediated-Education-Green-Audit Empirical study of inference energy, latency, and pedagogical quality for...	24	Experimental	2	Python
32	huggingface/hf_benchmarks A starter kit for evaluating benchmarks on the 🤗 Hub	24	Experimental	16	Python
33	TorchQL/torchql TorchQL is a query language for Python-based machine learning models and datasets.	23	Experimental	10	Python
34	ha-196120/swiftembed-benchmarks 🚀 Evaluate SwiftEmbed's performance with benchmarking scripts for ultra-fast...	22	Experimental	—	Lua
35	Alwx83383838/RuQualBench 🐸 Evaluate Russian language quality in LLMs by measuring typical errors...	22	Experimental	—	Python
36	michael-borck/loco-bench Systematic benchmarks of quantized small language models on consumer hardware	22	Experimental	—	JavaScript
37	nprint/benchmarks A central repository to track the progress of network traffic analysis	22	Experimental	7	SCSS
38	yuliu625/Yu-Deep-Learning-Toolkit A versatile deep learning toolkit providing reusable components for common...	19	Experimental	—	Python
39	Jon-Sina/Benchmark_Embedding_Models 🔍 Benchmark embedding models by creating custom datasets to evaluate and...	16	Experimental	—	Shell
40	edlansiaux/swiftembed-benchmarks Repository of benchmarking scripts for the SwiftEmbed embedding system, a...	15	Experimental	—	Lua
41	lkopf/prism [NeurIPS 2025] PRISM is a multi-concept feature description framework which...	15	Experimental	8	Jupyter Notebook
42	helkaroui/RapidFlow RapidFlow is a straightforward tool for bringing machine learning models...	15	Experimental	6	JavaScript
43	Sahilrajveer/reasonbench 📊 Evaluate machine learning models with realistic benchmarks that offer a...	14	Experimental	—	—
44	facu18911891/python-cqb ⚙️ Simplify the design and management of complex queries in Python with this...	14	Experimental	—	—

Comparisons in this category

bittensor and vanta-network (86 vs 51) benchopt and nnbench (72 vs 41)