ML Inference Benchmarking ML Frameworks
Standardized benchmarks and performance evaluation frameworks for ML model inference across devices and hardware (GPUs, CPUs, mobile, edge). Does NOT include training benchmarks, model architectures, or optimization techniques without benchmark implementations.
There are 76 ml inference benchmarking frameworks tracked. 1 score above 70 (verified tier). The highest-rated is NVIDIA/TransformerEngine at 76/100 with 3,206 stars. 3 of the top 10 are actively maintained.
Get all 76 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-inference-benchmarking&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including... |
|
Verified |
| 2 |
mlcommons/inference
Reference implementations of MLPerf® inference benchmarks |
|
Established |
| 3 |
datamade/usaddress
:us: a python library for parsing unstructured United States address strings... |
|
Established |
| 4 |
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street... |
|
Established |
| 5 |
mlcommons/training
Reference implementations of MLPerf® training benchmarks |
|
Established |
| 6 |
mlcommons/storage
MLPerf® Storage Benchmark Suite |
|
Established |
| 7 |
Ki6an/fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x. |
|
Established |
| 8 |
CMU-SAFARI/Pythia
A customizable hardware prefetching framework using online reinforcement... |
|
Established |
| 9 |
deepspeedai/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. |
|
Established |
| 10 |
itlab-vision/dl-benchmark
Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow,... |
|
Established |
| 11 |
ise-uiuc/nnsmith
Automated DNN generation for fuzz testing and more |
|
Emerging |
| 12 |
Tencent/PocketFlow
An Automatic Model Compression (AutoMC) framework for developing smaller and... |
|
Emerging |
| 13 |
TristanBilot/mlx-benchmark
Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) +... |
|
Emerging |
| 14 |
microsoft/hummingbird
Hummingbird compiles trained ML models into tensor computation for faster inference. |
|
Emerging |
| 15 |
CMU-SAFARI/Hermes
A speculative mechanism to accelerate long-latency off-chip load requests by... |
|
Emerging |
| 16 |
mrdbourke/m1-machine-learning-test
Code for testing various M1 Chip benchmarks with TensorFlow. |
|
Emerging |
| 17 |
hanxiao/flash-kmeans-mlx
IO-aware batched K-Means for Apple Silicon, ported from Flash-KMeans... |
|
Emerging |
| 18 |
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library |
|
Emerging |
| 19 |
XiaoMi/mobile-ai-bench
Benchmarking Neural Network Inference on Mobile Devices |
|
Emerging |
| 20 |
mlcommons/inference_results_v5.1
This repository contains the results and code for the MLPerf® Inference v5.1... |
|
Emerging |
| 21 |
OpenBMB/BMInf
Efficient Inference for Big Models |
|
Emerging |
| 22 |
ChharithOeun/torch-amd-setup
Auto-detect AMD GPU for PyTorch — ROCm, DirectML, CUDA, MPS, CPU. Fixes... |
|
Emerging |
| 23 |
AI-performance/embedded-ai.bench
benchmark for embededded-ai deep learning inference engines, such as NCNN /... |
|
Emerging |
| 24 |
hanxiao/umap-mlx
UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn. |
|
Emerging |
| 25 |
tlkh/tf-metal-experiments
TensorFlow Metal Backend on Apple Silicon Experiments (just for fun) |
|
Emerging |
| 26 |
mlalma/MLXUtilsLibrary
Utilities for easing the development of machine learning inference libraries... |
|
Emerging |
| 27 |
PEQUAN/hpc-mix-bench
Benchmarks for mixed-precision emulations |
|
Emerging |
| 28 |
RAZZULLIX/fast_topk_batched
High-performance batched Top-K selection for CPU inference. Up to 80x faster... |
|
Emerging |
| 29 |
mlcommons/inference_results_v5.0
This repository contains the results and code for the MLPerf® Inference v5.0... |
|
Emerging |
| 30 |
mlcommons/mlperf_client
MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on... |
|
Emerging |
| 31 |
hanxiao/mlx-vis
Pure MLX implementations of UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, and... |
|
Emerging |
| 32 |
mlcommons/training_results_v4.0
This repository contains the results and code for the MLPerf™ Training v4.0... |
|
Emerging |
| 33 |
ayinedjimi/KVortex
VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache... |
|
Emerging |
| 34 |
ise-uiuc/WhiteFox
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models (OOPSLA 2024) |
|
Emerging |
| 35 |
ProbioticFarmer/mlx-deterministic
Batch-invariant operations for deterministic LLM inference on Apple Silicon using MLX |
|
Emerging |
| 36 |
CMU-SAFARI/Athena
A reinforcement learning based policy to dynamically coordinate off-chip... |
|
Emerging |
| 37 |
bartbussmann/BatchTopK
Implementation of the BatchTopK activation function for training sparse... |
|
Emerging |
| 38 |
CMU-SAFARI/Pythia-HDL
Implementation of Pythia: A Customizable Hardware Prefetching Framework... |
|
Experimental |
| 39 |
killerbotofthenewworld/DDR5-AI-memory-tuner
🧠 The Ultimate AI-Powered DDR5 Memory Tuning Simulator |
|
Experimental |
| 40 |
kqb/mlx-od-moe
On-Demand Mixture of Experts for Apple Silicon — run 375GB models in 192GB RAM |
|
Experimental |
| 41 |
lin-tan/DocTer
For our ISSTA22 paper "DocTer: Documentation-Guided Fuzzing for Testing Deep... |
|
Experimental |
| 42 |
TristanBilot/mlx-GCN
MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2... |
|
Experimental |
| 43 |
cotesiito/flashtensors
🚀 Accelerate your AI projects with flashtensors, a fast inference engine... |
|
Experimental |
| 44 |
hanxiao/pacmap-mlx
PaCMAP in pure MLX for Apple Silicon. Pure GPU, no scipy/numba. |
|
Experimental |
| 45 |
Kokotpica/surogate
🚀 Accelerate large language model training and fine-tuning with Surogate’s... |
|
Experimental |
| 46 |
gxcsoccer/alloy
Hybrid SSM-Attention language model on Apple Silicon with MLX — interleaving... |
|
Experimental |
| 47 |
Rianbajukendari/mini-infer
🚀 Accelerate LLM inference with Mini-Infer, a high-performance engine... |
|
Experimental |
| 48 |
Pomilon/LEMA
LEMA (Layer-wise Efficient Memory Abstraction): A hardware-aware framework... |
|
Experimental |
| 49 |
RobotFlow-Labs/container-toolkit-mlx
GPU-accelerated MLX inference for Linux containers on Apple Silicon. The... |
|
Experimental |
| 50 |
dilbersha/llm-inference-benchmarking-3080
A production-grade telemetry-aware suite for benchmarking LLM inference... |
|
Experimental |
| 51 |
instax-dutta/easy-mlx
easy-mlx — Local AI runtime for Apple Silicon powered by MLX. |
|
Experimental |
| 52 |
eembc/energyrunner
The EEMBC EnergyRunner application framework for the MLPerf Tiny benchmark. |
|
Experimental |
| 53 |
Yuan-ManX/infera
Infera — A High-Performance Inference Engine for Large Language Models. |
|
Experimental |
| 54 |
ise-uiuc/DeepREL
Fuzzing Deep-Learning Libraries via Automated Relational API Inference... |
|
Experimental |
| 55 |
milliaccount/SynapSwap
🔄 Transform your GPU's VRAM limits with SynapSwap, a predictive... |
|
Experimental |
| 56 |
aallan/benchmarking-ml-on-the-edge
Benchmarking machine learning inferencing on embedded hardware. |
|
Experimental |
| 57 |
makgunay/research-mlx-ui
Autonomous ML research on Apple Silicon — Karpathy's autoresearch with MLX +... |
|
Experimental |
| 58 |
kossisoroyce/timber-benchmarks
Benchmarks for Timber AOT compiler: zero-RAM tree-based ML inference and... |
|
Experimental |
| 59 |
chrispion/fast_topk_batched
🚀 Accelerate CPU inference with Fast TopK for high-performance batched Top-K... |
|
Experimental |
| 60 |
ChharithOeun/directml-benchmark
Reproducible GPU float32 benchmarks — AMD DirectML 40.2x speedup on RX 5700... |
|
Experimental |
| 61 |
timteh/timteh-forge
⚡ TIMTEH Model Forge — Uncensored, abliterated & reasoning-distilled GGUFs.... |
|
Experimental |
| 62 |
ise-uiuc/NablaFuzz
Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23) |
|
Experimental |
| 63 |
99roomz/lokly
Address parser for Indian Addresses - Demo at |
|
Experimental |
| 64 |
ssmall256/mps-kernels-skill
Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests,... |
|
Experimental |
| 65 |
mctosima/mlx_playground
Run Image Classification on Apple Silicon (Mac) |
|
Experimental |
| 66 |
SYSU-Video/MFIBA
MFIBA: Multiscale Feature Importance-based Bit Allocation for End-to-End... |
|
Experimental |
| 67 |
hogeheer499-commits/strix-halo-guide
57 t/s LLM inference on AMD Ryzen AI MAX+ 395 — the complete optimization... |
|
Experimental |
| 68 |
hollance/metal-gpgpu
Collection of notes on how to use Apple’s Metal API for compute tasks |
|
Experimental |
| 69 |
emiliaon/mach
🚀 Load test HTTP servers with speed and precision using Mach, an ultra-fast... |
|
Experimental |
| 70 |
RobotFlow-Labs/LeRobot-mlx
LeRobot-MLX: HuggingFace LeRobot ported to Apple MLX for native Apple... |
|
Experimental |
| 71 |
DahsjsDio/mlx-vis
Accelerate high-speed dimensionality reduction on Apple Silicon with pure... |
|
Experimental |
| 72 |
anviit/llm-inference-serving
Production LLM inference stack — 28ms TTFT, 39 tok/s, 81% cache hit rate on a 6GB GPU |
|
Experimental |
| 73 |
billyzs/bench
Demo for using Google Benchmark and Apple's MLX |
|
Experimental |
| 74 |
vladBaciu/MLino-Bench
MLino bench: A comprehensive benchmarking tool for evaluating ML models on... |
|
Experimental |
| 75 |
metaskills/fast-llama-inference
Exploring Accelerated Compound AI Systems with SambaNova & Llama 3.3-70B |
|
Experimental |
| 76 |
cmontemuino/amd-mi300x-research-data
Research datasets and experimental results from comprehensive ML... |
|
Experimental |