ML Inference Benchmarking ML Frameworks

Standardized benchmarks and performance evaluation frameworks for ML model inference across devices and hardware (GPUs, CPUs, mobile, edge). Does NOT include training benchmarks, model architectures, or optimization techniques without benchmark implementations.

There are 76 ml inference benchmarking frameworks tracked. 1 score above 70 (verified tier). The highest-rated is NVIDIA/TransformerEngine at 76/100 with 3,206 stars. 3 of the top 10 are actively maintained.

Get all 76 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-inference-benchmarking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	NVIDIA/TransformerEngine A library for accelerating Transformer models on NVIDIA GPUs, including...	76	Verified	3,206	Python
2	mlcommons/inference Reference implementations of MLPerf® inference benchmarks	67	Established	1,539	Python
3	datamade/usaddress :us: a python library for parsing unstructured United States address strings...	65	Established	1,618	Python
4	GRAAL-Research/deepparse Deepparse is a state-of-the-art library for parsing multinational street...	63	Established	332	Python
5	mlcommons/training Reference implementations of MLPerf® training benchmarks	60	Established	1,748	Python
6	mlcommons/storage MLPerf® Storage Benchmark Suite	54	Established	175	Python
7	Ki6an/fastT5 ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.	54	Established	589	Python
8	CMU-SAFARI/Pythia A customizable hardware prefetching framework using online reinforcement...	51	Established	158	C++
9	deepspeedai/DeepSpeed-MII MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.	50	Established	2,099	Python
10	itlab-vision/dl-benchmark Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow,...	50	Established	35	HTML
11	ise-uiuc/nnsmith Automated DNN generation for fuzz testing and more	49	Emerging	144	Python
12	Tencent/PocketFlow An Automatic Model Compression (AutoMC) framework for developing smaller and...	49	Emerging	2,914	Python
13	TristanBilot/mlx-benchmark Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) +...	49	Emerging	217	Python
14	microsoft/hummingbird Hummingbird compiles trained ML models into tensor computation for faster inference.	47	Emerging	3,530	Python
15	CMU-SAFARI/Hermes A speculative mechanism to accelerate long-latency off-chip load requests by...	44	Emerging	77	C++
16	mrdbourke/m1-machine-learning-test Code for testing various M1 Chip benchmarks with TensorFlow.	44	Emerging	536	Jupyter Notebook
17	hanxiao/flash-kmeans-mlx IO-aware batched K-Means for Apple Silicon, ported from Flash-KMeans...	41	Emerging	11	Python
18	Azure/MS-AMP Microsoft Automatic Mixed Precision Library	41	Emerging	634	Python
19	XiaoMi/mobile-ai-bench Benchmarking Neural Network Inference on Mobile Devices	39	Emerging	386	C++
20	mlcommons/inference_results_v5.1 This repository contains the results and code for the MLPerf® Inference v5.1...	39	Emerging	3	HTML
21	OpenBMB/BMInf Efficient Inference for Big Models	38	Emerging	587	Python
22	ChharithOeun/torch-amd-setup Auto-detect AMD GPU for PyTorch — ROCm, DirectML, CUDA, MPS, CPU. Fixes...	37	Emerging	1	Python
23	AI-performance/embedded-ai.bench benchmark for embededded-ai deep learning inference engines, such as NCNN /...	36	Emerging	202	Python
24	hanxiao/umap-mlx UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.	36	Emerging	40	Python
25	tlkh/tf-metal-experiments TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)	35	Emerging	280	Jupyter Notebook
26	mlalma/MLXUtilsLibrary Utilities for easing the development of machine learning inference libraries...	35	Emerging	2	Swift
27	PEQUAN/hpc-mix-bench Benchmarks for mixed-precision emulations	35	Emerging	1	C++
28	RAZZULLIX/fast_topk_batched High-performance batched Top-K selection for CPU inference. Up to 80x faster...	34	Emerging	16	C++
29	mlcommons/inference_results_v5.0 This repository contains the results and code for the MLPerf® Inference v5.0...	34	Emerging	12	HTML
30	mlcommons/mlperf_client MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on...	34	Emerging	80	C++
31	hanxiao/mlx-vis Pure MLX implementations of UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, and...	34	Emerging	65	Python
32	mlcommons/training_results_v4.0 This repository contains the results and code for the MLPerf™ Training v4.0...	33	Emerging	12	Python
33	ayinedjimi/KVortex VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache...	33	Emerging	2	C++
34	ise-uiuc/WhiteFox WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models (OOPSLA 2024)	33	Emerging	80	Python
35	ProbioticFarmer/mlx-deterministic Batch-invariant operations for deterministic LLM inference on Apple Silicon using MLX	32	Emerging	7	Python
36	CMU-SAFARI/Athena A reinforcement learning based policy to dynamically coordinate off-chip...	30	Emerging	8	C++
37	bartbussmann/BatchTopK Implementation of the BatchTopK activation function for training sparse...	30	Emerging	61	Python
38	CMU-SAFARI/Pythia-HDL Implementation of Pythia: A Customizable Hardware Prefetching Framework...	28	Experimental	17	Scala
39	killerbotofthenewworld/DDR5-AI-memory-tuner 🧠 The Ultimate AI-Powered DDR5 Memory Tuning Simulator	27	Experimental	8	Python
40	kqb/mlx-od-moe On-Demand Mixture of Experts for Apple Silicon — run 375GB models in 192GB RAM	27	Experimental	1	Python
41	lin-tan/DocTer For our ISSTA22 paper "DocTer: Documentation-Guided Fuzzing for Testing Deep...	26	Experimental	39	—
42	TristanBilot/mlx-GCN MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2...	26	Experimental	25	Python
43	cotesiito/flashtensors 🚀 Accelerate your AI projects with flashtensors, a fast inference engine...	26	Experimental	10	Python
44	hanxiao/pacmap-mlx PaCMAP in pure MLX for Apple Silicon. Pure GPU, no scipy/numba.	25	Experimental	19	Python
45	Kokotpica/surogate 🚀 Accelerate large language model training and fine-tuning with Surogate’s...	24	Experimental	—	C++
46	gxcsoccer/alloy Hybrid SSM-Attention language model on Apple Silicon with MLX — interleaving...	24	Experimental	2	Python
47	Rianbajukendari/mini-infer 🚀 Accelerate LLM inference with Mini-Infer, a high-performance engine...	24	Experimental	—	Python
48	Pomilon/LEMA LEMA (Layer-wise Efficient Memory Abstraction): A hardware-aware framework...	23	Experimental	1	Python
49	RobotFlow-Labs/container-toolkit-mlx GPU-accelerated MLX inference for Linux containers on Apple Silicon. The...	23	Experimental	1	Swift
50	dilbersha/llm-inference-benchmarking-3080 A production-grade telemetry-aware suite for benchmarking LLM inference...	23	Experimental	1	Python
51	instax-dutta/easy-mlx easy-mlx — Local AI runtime for Apple Silicon powered by MLX.	23	Experimental	1	Python
52	eembc/energyrunner The EEMBC EnergyRunner application framework for the MLPerf Tiny benchmark.	23	Experimental	21	—
53	Yuan-ManX/infera Infera — A High-Performance Inference Engine for Large Language Models.	23	Experimental	5	Python
54	ise-uiuc/DeepREL Fuzzing Deep-Learning Libraries via Automated Relational API Inference...	23	Experimental	40	Python
55	milliaccount/SynapSwap 🔄 Transform your GPU's VRAM limits with SynapSwap, a predictive...	22	Experimental	—	C
56	aallan/benchmarking-ml-on-the-edge Benchmarking machine learning inferencing on embedded hardware.	22	Experimental	26	Python
57	makgunay/research-mlx-ui Autonomous ML research on Apple Silicon — Karpathy's autoresearch with MLX +...	22	Experimental	—	Python
58	kossisoroyce/timber-benchmarks Benchmarks for Timber AOT compiler: zero-RAM tree-based ML inference and...	22	Experimental	—	C
59	chrispion/fast_topk_batched 🚀 Accelerate CPU inference with Fast TopK for high-performance batched Top-K...	22	Experimental	—	C++
60	ChharithOeun/directml-benchmark Reproducible GPU float32 benchmarks — AMD DirectML 40.2x speedup on RX 5700...	22	Experimental	—	Python
61	timteh/timteh-forge ⚡ TIMTEH Model Forge — Uncensored, abliterated & reasoning-distilled GGUFs....	22	Experimental	—	Shell
62	ise-uiuc/NablaFuzz Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)	20	Experimental	27	Python
63	99roomz/lokly Address parser for Indian Addresses - Demo at	20	Experimental	6	HTML
64	ssmall256/mps-kernels-skill Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests,...	19	Experimental	—	Python
65	mctosima/mlx_playground Run Image Classification on Apple Silicon (Mac)	19	Experimental	8	Python
66	SYSU-Video/MFIBA MFIBA: Multiscale Feature Importance-based Bit Allocation for End-to-End...	18	Experimental	3	Python
67	hogeheer499-commits/strix-halo-guide 57 t/s LLM inference on AMD Ryzen AI MAX+ 395 — the complete optimization...	17	Experimental	3	—
68	hollance/metal-gpgpu Collection of notes on how to use Apple’s Metal API for compute tasks	17	Experimental	107	—
69	emiliaon/mach 🚀 Load test HTTP servers with speed and precision using Mach, an ultra-fast...	14	Experimental	—	C
70	RobotFlow-Labs/LeRobot-mlx LeRobot-MLX: HuggingFace LeRobot ported to Apple MLX for native Apple...	14	Experimental	—	Python
71	DahsjsDio/mlx-vis Accelerate high-speed dimensionality reduction on Apple Silicon with pure...	14	Experimental	—	Python
72	anviit/llm-inference-serving Production LLM inference stack — 28ms TTFT, 39 tok/s, 81% cache hit rate on a 6GB GPU	14	Experimental	—	Python
73	billyzs/bench Demo for using Google Benchmark and Apple's MLX	12	Experimental	3	CMake
74	vladBaciu/MLino-Bench MLino bench: A comprehensive benchmarking tool for evaluating ML models on...	12	Experimental	3	C++
75	metaskills/fast-llama-inference Exploring Accelerated Compound AI Systems with SambaNova & Llama 3.3-70B	11	Experimental	—	TypeScript
76	cmontemuino/amd-mi300x-research-data Research datasets and experimental results from comprehensive ML...	11	Experimental	—	—

Comparisons in this category

inference and training (67 vs 60) inference and inference_results_v5.1 (67 vs 39)