Triton Inference Deployment ML Frameworks

Tools, frameworks, and guides for deploying machine learning models using NVIDIA Triton Inference Server, including optimization, benchmarking, and integration patterns. Does NOT include general inference serving, model training, or Triton kernel programming (see mojo-ml-frameworks for low-level GPU kernel work).

There are 43 triton inference deployment frameworks tracked. 1 score above 70 (verified tier). The highest-rated is open-mmlab/mmdeploy at 70/100 with 3,107 stars and 11,282 monthly downloads. 4 of the top 10 are actively maintained.

Get all 43 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=triton-inference-deployment&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	open-mmlab/mmdeploy OpenMMLab Model Deployment Framework	70	Verified	3,107	Python
2	triton-inference-server/server The Triton Inference Server provides an optimized cloud and edge inferencing...	69	Established	10,426	Python
3	gpu-mode/Triton-Puzzles Puzzles for learning Triton	65	Established	2,338	Jupyter Notebook
4	hyperai/tvm-cn TVM Documentation in Chinese Simplified / TVM 中文文档	64	Established	3,501	TypeScript
5	triton-inference-server/model_analyzer Triton Model Analyzer is a CLI tool to help with better understanding of the...	63	Established	507	Python
6	hailo-ai/hailo_model_zoo The Hailo Model Zoo includes pre-trained models and a full building and...	59	Established	613	Python
7	ot-triton-lab/flash-sinkhorn FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton....	56	Established	183	Python
8	triton-inference-server/model_navigator Triton Model Navigator is an inference toolkit designed for optimizing and...	52	Established	218	Python
9	LukasHedegaard/pytorch-benchmark Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu...	47	Emerging	109	Python
10	hyperai/triton-cn Triton Documentation in Chinese Simplified / Triton 中文文档	46	Emerging	105	TypeScript
11	srush/Tensor-Puzzles Solve puzzles. Improve your pytorch.	46	Emerging	3,976	Jupyter Notebook
12	srush/Triton-Puzzles Puzzles for learning Triton	45	Emerging	2,332	Jupyter Notebook
13	suvojit-0x55aa/mixed-precision-pytorch Training with FP16 weights in PyTorch	44	Emerging	81	Python
14	triton-inference-server/pytriton PyTriton is a Flask/FastAPI-like interface that simplifies Triton's...	43	Emerging	835	Python
15	ai-dynamo/aitune NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep...	41	Emerging	8	Python
16	sachinsharma9780/Build-ML-pipelines-for-Computer-Vision-NLP-and-Graph-Neural-Networks-using-Nvidia-Triton-Server Build ML pipelines for Computer Vision, NLP and Graph Neural Networks using...	41	Emerging	42	Jupyter Notebook
17	BobMcDear/attorch A subset of PyTorch's neural network modules, written in Python using...	41	Emerging	597	Python
18	philipturner/metal-flash-attention FlashAttention (Metal Port)	40	Emerging	589	Swift
19	alexzhang13/flashattention2-custom-mask Triton implementation of FlashAttention2 that adds Custom Masks.	39	Emerging	170	Python
20	tnbar/tednet TedNet: A Pytorch Toolkit for Tensor Decomposition Networks	39	Emerging	96	Python
21	kakaobrain/trident A performance library for machine learning applications.	38	Emerging	183	Python
22	anujinho/trident Official repository for the paper TRIDENT: Transductive Decoupled...	37	Emerging	40	Python
23	fversaci/cassandra-dali-plugin Cassandra plugin for NVIDIA DALI	25	Experimental	1	C++
24	dtunai/Tri-RMSNorm Efficient kernel for RMS normalization with fused operations, includes both...	25	Experimental	12	Python
25	daemyung/practice-triton 삼각형의 실전! Triton	24	Experimental	16	Python
26	indri-voice/vit.triton VIT inference in triton because, why not?	23	Experimental	36	Python
27	ZrobMiloudaa/jetson-orin-matmul-analysis 🔍 Analyze CUDA matrix multiplication performance and power consumption on...	23	Experimental	1	Python
28	jayeshmahapatra/triton-fastapi-docker A repository demonstrating deploying ML models using Triton + FastAPI + Docker	23	Experimental	6	Jupyter Notebook
29	MaxLSB/flash-attn2 FlashAttention for sliding window attention in Triton (fwd + bwd pass)	23	Experimental	11	Python
30	Anggipratama17/triton-accelerated-attention 🚀 Implement Triton GPU kernels for multi-head self-attention, enabling...	22	Experimental	—	Python
31	jrajath94/triton-inference-kernels Fused softmax + Flash Attention in OpenAI Triton — 50x VRAM reduction at seq_len=2048	22	Experimental	—	Python
32	hiennguyen9874/triton-face-recognition Triton face detection & recognition	21	Experimental	8	Jupyter Notebook
33	Cre4T3Tiv3/jetson-orin-matmul-analysis Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x...	20	Experimental	14	Python
34	neuro-inc/mlops-pytorch-mlflow-triton Example of deployment Pytorch model into the Triton inference server via...	19	Experimental	6	Jupyter Notebook
35	niyazed/triton-mnist-example MNIST inference example on NVIDIA Triton Inference Server	16	Experimental	4	PureBasic
36	dbrll/ATTN-11 Paper Tape is All You Need	14	Experimental	—	Fortran
37	LessUp/cuflash-attn Pure CUDA C++ FlashAttention Forward/Backward Pass with Causal Masking &...	14	Experimental	—	Cuda
38	angelolamonaca/PyTorch-Precision-Converter A flexible utility for converting tensor precision in PyTorch models and...	14	Experimental	11	Python
39	lengstrom/flashback A FlashAttention backwards-over-backwards ⚡🔙🔙	13	Experimental	10	Jupyter Notebook
40	kalyani-25/Reimplementation_flash-attention-from-scratch 16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official...	11	Experimental	—	Cuda
41	JonSnow1807/Fused-LayerNorm-CUDA-Operator High-performance CUDA implementation of LayerNorm for PyTorch achieving...	11	Experimental	—	Python
42	Achiwilms/NVIDIA-Triton-Deployment-Quickstart QuickStart for Deploying a Basic Model on the Triton Inference Server	11	Experimental	—	Python
43	palapav/triton-compute-kernels A collection of Triton compute kernels for common ML operations	11	Experimental	—	—

Comparisons in this category

Triton-Puzzles and Tensor-Puzzles (65 vs 46) tvm-cn and triton-cn (64 vs 46) model_analyzer and model_navigator (63 vs 52)