ML Inference Benchmarking ML Frameworks

Standardized benchmarks and performance evaluation frameworks for ML model inference across devices and hardware (GPUs, CPUs, mobile, edge). Does NOT include training benchmarks, model architectures, or optimization techniques without benchmark implementations.

There are 76 ml inference benchmarking frameworks tracked. 1 score above 70 (verified tier). The highest-rated is NVIDIA/TransformerEngine at 76/100 with 3,206 stars. 3 of the top 10 are actively maintained.

Get all 76 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-inference-benchmarking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including...

76
Verified
2 mlcommons/inference

Reference implementations of MLPerf® inference benchmarks

67
Established
3 datamade/usaddress

:us: a python library for parsing unstructured United States address strings...

65
Established
4 GRAAL-Research/deepparse

Deepparse is a state-of-the-art library for parsing multinational street...

63
Established
5 mlcommons/training

Reference implementations of MLPerf® training benchmarks

60
Established
6 mlcommons/storage

MLPerf® Storage Benchmark Suite

54
Established
7 Ki6an/fastT5

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

54
Established
8 CMU-SAFARI/Pythia

A customizable hardware prefetching framework using online reinforcement...

51
Established
9 deepspeedai/DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

50
Established
10 itlab-vision/dl-benchmark

Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow,...

50
Established
11 ise-uiuc/nnsmith

Automated DNN generation for fuzz testing and more

49
Emerging
12 Tencent/PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and...

49
Emerging
13 TristanBilot/mlx-benchmark

Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) +...

49
Emerging
14 microsoft/hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.

47
Emerging
15 CMU-SAFARI/Hermes

A speculative mechanism to accelerate long-latency off-chip load requests by...

44
Emerging
16 mrdbourke/m1-machine-learning-test

Code for testing various M1 Chip benchmarks with TensorFlow.

44
Emerging
17 hanxiao/flash-kmeans-mlx

IO-aware batched K-Means for Apple Silicon, ported from Flash-KMeans...

41
Emerging
18 Azure/MS-AMP

Microsoft Automatic Mixed Precision Library

41
Emerging
19 XiaoMi/mobile-ai-bench

Benchmarking Neural Network Inference on Mobile Devices

39
Emerging
20 mlcommons/inference_results_v5.1

This repository contains the results and code for the MLPerf® Inference v5.1...

39
Emerging
21 OpenBMB/BMInf

Efficient Inference for Big Models

38
Emerging
22 ChharithOeun/torch-amd-setup

Auto-detect AMD GPU for PyTorch — ROCm, DirectML, CUDA, MPS, CPU. Fixes...

37
Emerging
23 AI-performance/embedded-ai.bench

benchmark for embededded-ai deep learning inference engines, such as NCNN /...

36
Emerging
24 hanxiao/umap-mlx

UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.

36
Emerging
25 tlkh/tf-metal-experiments

TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

35
Emerging
26 mlalma/MLXUtilsLibrary

Utilities for easing the development of machine learning inference libraries...

35
Emerging
27 PEQUAN/hpc-mix-bench

Benchmarks for mixed-precision emulations

35
Emerging
28 RAZZULLIX/fast_topk_batched

High-performance batched Top-K selection for CPU inference. Up to 80x faster...

34
Emerging
29 mlcommons/inference_results_v5.0

This repository contains the results and code for the MLPerf® Inference v5.0...

34
Emerging
30 mlcommons/mlperf_client

MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on...

34
Emerging
31 hanxiao/mlx-vis

Pure MLX implementations of UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, and...

34
Emerging
32 mlcommons/training_results_v4.0

This repository contains the results and code for the MLPerf™ Training v4.0...

33
Emerging
33 ayinedjimi/KVortex

VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache...

33
Emerging
34 ise-uiuc/WhiteFox

WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models (OOPSLA 2024)

33
Emerging
35 ProbioticFarmer/mlx-deterministic

Batch-invariant operations for deterministic LLM inference on Apple Silicon using MLX

32
Emerging
36 CMU-SAFARI/Athena

A reinforcement learning based policy to dynamically coordinate off-chip...

30
Emerging
37 bartbussmann/BatchTopK

Implementation of the BatchTopK activation function for training sparse...

30
Emerging
38 CMU-SAFARI/Pythia-HDL

Implementation of Pythia: A Customizable Hardware Prefetching Framework...

28
Experimental
39 killerbotofthenewworld/DDR5-AI-memory-tuner

🧠 The Ultimate AI-Powered DDR5 Memory Tuning Simulator

27
Experimental
40 kqb/mlx-od-moe

On-Demand Mixture of Experts for Apple Silicon — run 375GB models in 192GB RAM

27
Experimental
41 lin-tan/DocTer

For our ISSTA22 paper "DocTer: Documentation-Guided Fuzzing for Testing Deep...

26
Experimental
42 TristanBilot/mlx-GCN

MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2...

26
Experimental
43 cotesiito/flashtensors

🚀 Accelerate your AI projects with flashtensors, a fast inference engine...

26
Experimental
44 hanxiao/pacmap-mlx

PaCMAP in pure MLX for Apple Silicon. Pure GPU, no scipy/numba.

25
Experimental
45 Kokotpica/surogate

🚀 Accelerate large language model training and fine-tuning with Surogate’s...

24
Experimental
46 gxcsoccer/alloy

Hybrid SSM-Attention language model on Apple Silicon with MLX — interleaving...

24
Experimental
47 Rianbajukendari/mini-infer

🚀 Accelerate LLM inference with Mini-Infer, a high-performance engine...

24
Experimental
48 Pomilon/LEMA

LEMA (Layer-wise Efficient Memory Abstraction): A hardware-aware framework...

23
Experimental
49 RobotFlow-Labs/container-toolkit-mlx

GPU-accelerated MLX inference for Linux containers on Apple Silicon. The...

23
Experimental
50 dilbersha/llm-inference-benchmarking-3080

A production-grade telemetry-aware suite for benchmarking LLM inference...

23
Experimental
51 instax-dutta/easy-mlx

easy-mlx — Local AI runtime for Apple Silicon powered by MLX.

23
Experimental
52 eembc/energyrunner

The EEMBC EnergyRunner application framework for the MLPerf Tiny benchmark.

23
Experimental
53 Yuan-ManX/infera

Infera — A High-Performance Inference Engine for Large Language Models.

23
Experimental
54 ise-uiuc/DeepREL

Fuzzing Deep-Learning Libraries via Automated Relational API Inference...

23
Experimental
55 milliaccount/SynapSwap

🔄 Transform your GPU's VRAM limits with SynapSwap, a predictive...

22
Experimental
56 aallan/benchmarking-ml-on-the-edge

Benchmarking machine learning inferencing on embedded hardware.

22
Experimental
57 makgunay/research-mlx-ui

Autonomous ML research on Apple Silicon — Karpathy's autoresearch with MLX +...

22
Experimental
58 kossisoroyce/timber-benchmarks

Benchmarks for Timber AOT compiler: zero-RAM tree-based ML inference and...

22
Experimental
59 chrispion/fast_topk_batched

🚀 Accelerate CPU inference with Fast TopK for high-performance batched Top-K...

22
Experimental
60 ChharithOeun/directml-benchmark

Reproducible GPU float32 benchmarks — AMD DirectML 40.2x speedup on RX 5700...

22
Experimental
61 timteh/timteh-forge

⚡ TIMTEH Model Forge — Uncensored, abliterated & reasoning-distilled GGUFs....

22
Experimental
62 ise-uiuc/NablaFuzz

Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)

20
Experimental
63 99roomz/lokly

Address parser for Indian Addresses - Demo at

20
Experimental
64 ssmall256/mps-kernels-skill

Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests,...

19
Experimental
65 mctosima/mlx_playground

Run Image Classification on Apple Silicon (Mac)

19
Experimental
66 SYSU-Video/MFIBA

MFIBA: Multiscale Feature Importance-based Bit Allocation for End-to-End...

18
Experimental
67 hogeheer499-commits/strix-halo-guide

57 t/s LLM inference on AMD Ryzen AI MAX+ 395 — the complete optimization...

17
Experimental
68 hollance/metal-gpgpu

Collection of notes on how to use Apple’s Metal API for compute tasks

17
Experimental
69 emiliaon/mach

🚀 Load test HTTP servers with speed and precision using Mach, an ultra-fast...

14
Experimental
70 RobotFlow-Labs/LeRobot-mlx

LeRobot-MLX: HuggingFace LeRobot ported to Apple MLX for native Apple...

14
Experimental
71 DahsjsDio/mlx-vis

Accelerate high-speed dimensionality reduction on Apple Silicon with pure...

14
Experimental
72 anviit/llm-inference-serving

Production LLM inference stack — 28ms TTFT, 39 tok/s, 81% cache hit rate on a 6GB GPU

14
Experimental
73 billyzs/bench

Demo for using Google Benchmark and Apple's MLX

12
Experimental
74 vladBaciu/MLino-Bench

MLino bench: A comprehensive benchmarking tool for evaluating ML models on...

12
Experimental
75 metaskills/fast-llama-inference

Exploring Accelerated Compound AI Systems with SambaNova & Llama 3.3-70B

11
Experimental
76 cmontemuino/amd-mi300x-research-data

Research datasets and experimental results from comprehensive ML...

11
Experimental