Triton Inference Deployment ML Frameworks
Tools, frameworks, and guides for deploying machine learning models using NVIDIA Triton Inference Server, including optimization, benchmarking, and integration patterns. Does NOT include general inference serving, model training, or Triton kernel programming (see mojo-ml-frameworks for low-level GPU kernel work).
There are 43 triton inference deployment frameworks tracked. 1 score above 70 (verified tier). The highest-rated is open-mmlab/mmdeploy at 70/100 with 3,107 stars and 11,282 monthly downloads. 4 of the top 10 are actively maintained.
Get all 43 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=triton-inference-deployment&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework |
|
Verified |
| 2 |
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing... |
|
Established |
| 3 |
gpu-mode/Triton-Puzzles
Puzzles for learning Triton |
|
Established |
| 4 |
hyperai/tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档 |
|
Established |
| 5 |
triton-inference-server/model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the... |
|
Established |
| 6 |
hailo-ai/hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and... |
|
Established |
| 7 |
ot-triton-lab/flash-sinkhorn
FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton.... |
|
Established |
| 8 |
triton-inference-server/model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and... |
|
Established |
| 9 |
LukasHedegaard/pytorch-benchmark
Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu... |
|
Emerging |
| 10 |
hyperai/triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档 |
|
Emerging |
| 11 |
srush/Tensor-Puzzles
Solve puzzles. Improve your pytorch. |
|
Emerging |
| 12 |
srush/Triton-Puzzles
Puzzles for learning Triton |
|
Emerging |
| 13 |
suvojit-0x55aa/mixed-precision-pytorch
Training with FP16 weights in PyTorch |
|
Emerging |
| 14 |
triton-inference-server/pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's... |
|
Emerging |
| 15 |
ai-dynamo/aitune
NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep... |
|
Emerging |
| 16 |
sachinsharma9780/Build-ML-pipelines-for-Computer-Vision-NLP-and-Graph-Neural-Networks-using-Nvidia-Triton-Server
Build ML pipelines for Computer Vision, NLP and Graph Neural Networks using... |
|
Emerging |
| 17 |
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using... |
|
Emerging |
| 18 |
philipturner/metal-flash-attention
FlashAttention (Metal Port) |
|
Emerging |
| 19 |
alexzhang13/flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks. |
|
Emerging |
| 20 |
tnbar/tednet
TedNet: A Pytorch Toolkit for Tensor Decomposition Networks |
|
Emerging |
| 21 |
kakaobrain/trident
A performance library for machine learning applications. |
|
Emerging |
| 22 |
anujinho/trident
Official repository for the paper TRIDENT: Transductive Decoupled... |
|
Emerging |
| 23 |
fversaci/cassandra-dali-plugin
Cassandra plugin for NVIDIA DALI |
|
Experimental |
| 24 |
dtunai/Tri-RMSNorm
Efficient kernel for RMS normalization with fused operations, includes both... |
|
Experimental |
| 25 |
daemyung/practice-triton
삼각형의 실전! Triton |
|
Experimental |
| 26 |
indri-voice/vit.triton
VIT inference in triton because, why not? |
|
Experimental |
| 27 |
ZrobMiloudaa/jetson-orin-matmul-analysis
🔍 Analyze CUDA matrix multiplication performance and power consumption on... |
|
Experimental |
| 28 |
jayeshmahapatra/triton-fastapi-docker
A repository demonstrating deploying ML models using Triton + FastAPI + Docker |
|
Experimental |
| 29 |
MaxLSB/flash-attn2
FlashAttention for sliding window attention in Triton (fwd + bwd pass) |
|
Experimental |
| 30 |
Anggipratama17/triton-accelerated-attention
🚀 Implement Triton GPU kernels for multi-head self-attention, enabling... |
|
Experimental |
| 31 |
jrajath94/triton-inference-kernels
Fused softmax + Flash Attention in OpenAI Triton — 50x VRAM reduction at seq_len=2048 |
|
Experimental |
| 32 |
hiennguyen9874/triton-face-recognition
Triton face detection & recognition |
|
Experimental |
| 33 |
Cre4T3Tiv3/jetson-orin-matmul-analysis
Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x... |
|
Experimental |
| 34 |
neuro-inc/mlops-pytorch-mlflow-triton
Example of deployment Pytorch model into the Triton inference server via... |
|
Experimental |
| 35 |
niyazed/triton-mnist-example
MNIST inference example on NVIDIA Triton Inference Server |
|
Experimental |
| 36 |
dbrll/ATTN-11
Paper Tape is All You Need |
|
Experimental |
| 37 |
LessUp/cuflash-attn
Pure CUDA C++ FlashAttention Forward/Backward Pass with Causal Masking &... |
|
Experimental |
| 38 |
angelolamonaca/PyTorch-Precision-Converter
A flexible utility for converting tensor precision in PyTorch models and... |
|
Experimental |
| 39 |
lengstrom/flashback
A FlashAttention backwards-over-backwards ⚡🔙🔙 |
|
Experimental |
| 40 |
kalyani-25/Reimplementation_flash-attention-from-scratch
16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official... |
|
Experimental |
| 41 |
JonSnow1807/Fused-LayerNorm-CUDA-Operator
High-performance CUDA implementation of LayerNorm for PyTorch achieving... |
|
Experimental |
| 42 |
Achiwilms/NVIDIA-Triton-Deployment-Quickstart
QuickStart for Deploying a Basic Model on the Triton Inference Server |
|
Experimental |
| 43 |
palapav/triton-compute-kernels
A collection of Triton compute kernels for common ML operations |
|
Experimental |