ML Benchmarking Frameworks

Tools and frameworks for reproducibly benchmarking, evaluating, and comparing machine learning models across different domains and datasets. Does NOT include domain-specific prediction tasks, competition leaderboards, or educational coursework collections.

There are 44 ml benchmarking frameworks tracked. 3 score above 70 (verified tier). The highest-rated is opentensor/bittensor at 86/100 with 1,383 stars and 107,641 monthly downloads. 2 of the top 10 are actively maintained.

Get all 44 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-benchmarking-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 opentensor/bittensor

Internet-scale Neural Networks

86
Verified
2 trailofbits/fickling

A Python pickling decompiler and static analyzer

78
Verified
3 benchopt/benchopt

A framework for reproducible, comparable benchmarks

72
Verified
4 BiomedSciAI/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by...

62
Established
5 taoshidev/vanta-network

Vanta Network built on Bittensor

51
Established
6 mosaicml/streaming

A Data Streaming Library for Efficient Neural Network Training

50
Established
7 breuner/elbencho

A distributed storage benchmark for file systems, object stores & block...

49
Emerging
8 google-research/zapbench

The Zebrafish Activity Prediction Benchmark measures progress on the problem...

45
Emerging
9 tensorflow/model-card-toolkit

A toolkit that streamlines and automates the generation of model cards

43
Emerging
10 KevinMusgrave/powerful-benchmarker

A library for ML benchmarking. It's powerful.

41
Emerging
11 SDNNetSim/FUSION

FUSION is an open-source project aimed at revolutionizing networking through...

41
Emerging
12 aai-institute/nnbench

A small framework for benchmarking machine learning models.

41
Emerging
13 mariusbrataas/flowpoints_ml

An intuitive approach to creating deep learning models

39
Emerging
14 HanBnrd/BenchNIRS

Benchmarking framework for machine learning with fNIRS

39
Emerging
15 heilcheng/openevals

Benchmarking suite for open-weight language models

38
Emerging
16 CryAndRRich/dataflow

Decoding customer behaviors via Hybrid Neural-ML frameworks (3rd place of...

36
Emerging
17 rllm-team/tlsql

Table Learning Structured Query Language

35
Emerging
18 scott-huberty/amica-python

Python Implementation of Adaptive Mixture ICA

35
Emerging
19 SafeRL-Lab/BenchNetRL

🔥Benchmarking of Neural Network Architectures in Reinforcement Learning.

34
Emerging
20 google-research/rliable

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML...

34
Emerging
21 florencejt/fusilli

A Python package housing a collection of deep-learning multi-modal data...

33
Emerging
22 modelflows/ModelFLOWs-app

ModelFLOWs application

32
Emerging
23 data-centric-ai/dcbench

A benchmark of data-centric tasks from across the machine learning lifecycle.

31
Emerging
24 opentensor/validators

Repository for bittensor validators

31
Emerging
25 DACUS1995/pytorch-mmap-dataset

A custom pytorch Dataset extension that provides a faster iteration and...

31
Emerging
26 IvanIZ/BenchPush

BenchPush is a comprehensive benchmarking suite designed for mobile robots...

30
Emerging
27 tcbenchstack/tcbench

tcbench is a Machine Learning and Deep Learning framework to train model...

29
Experimental
28 Jahid-Hasan1/Py-Fusion

🐍PyFusion🐍 is an open-source Python project designed to seamlessly integrate...

27
Experimental
29 kolesole/PredQL

PredQL is a Python framework for task generation in Relational Deep...

26
Experimental
30 neuroprismlab/PRISME-Brain-Power-Calculator

PRISME Power Calculator

26
Experimental
31 Kushalk0677/Inference-Energy-and-Latency-in-AI-Mediated-Education-Green-Audit

Empirical study of inference energy, latency, and pedagogical quality for...

24
Experimental
32 huggingface/hf_benchmarks

A starter kit for evaluating benchmarks on the 🤗 Hub

24
Experimental
33 TorchQL/torchql

TorchQL is a query language for Python-based machine learning models and datasets.

23
Experimental
34 ha-196120/swiftembed-benchmarks

🚀 Evaluate SwiftEmbed's performance with benchmarking scripts for ultra-fast...

22
Experimental
35 Alwx83383838/RuQualBench

🐸 Evaluate Russian language quality in LLMs by measuring typical errors...

22
Experimental
36 michael-borck/loco-bench

Systematic benchmarks of quantized small language models on consumer hardware

22
Experimental
37 nprint/benchmarks

A central repository to track the progress of network traffic analysis

22
Experimental
38 yuliu625/Yu-Deep-Learning-Toolkit

A versatile deep learning toolkit providing reusable components for common...

19
Experimental
39 Jon-Sina/Benchmark_Embedding_Models

🔍 Benchmark embedding models by creating custom datasets to evaluate and...

16
Experimental
40 edlansiaux/swiftembed-benchmarks

Repository of benchmarking scripts for the SwiftEmbed embedding system, a...

15
Experimental
41 lkopf/prism

[NeurIPS 2025] PRISM is a multi-concept feature description framework which...

15
Experimental
42 helkaroui/RapidFlow

RapidFlow is a straightforward tool for bringing machine learning models...

15
Experimental
43 Sahilrajveer/reasonbench

📊 Evaluate machine learning models with realistic benchmarks that offer a...

14
Experimental
44 facu18911891/python-cqb

⚙️ Simplify the design and management of complex queries in Python with this...

14
Experimental