Triton Inference Deployment ML Frameworks

Tools, frameworks, and guides for deploying machine learning models using NVIDIA Triton Inference Server, including optimization, benchmarking, and integration patterns. Does NOT include general inference serving, model training, or Triton kernel programming (see mojo-ml-frameworks for low-level GPU kernel work).

There are 43 triton inference deployment frameworks tracked. 1 score above 70 (verified tier). The highest-rated is open-mmlab/mmdeploy at 70/100 with 3,107 stars and 11,282 monthly downloads. 4 of the top 10 are actively maintained.

Get all 43 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=triton-inference-deployment&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 open-mmlab/mmdeploy

OpenMMLab Model Deployment Framework

70
Verified
2 triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing...

69
Established
3 gpu-mode/Triton-Puzzles

Puzzles for learning Triton

65
Established
4 hyperai/tvm-cn

TVM Documentation in Chinese Simplified / TVM 中文文档

64
Established
5 triton-inference-server/model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the...

63
Established
6 hailo-ai/hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and...

59
Established
7 ot-triton-lab/flash-sinkhorn

FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton....

56
Established
8 triton-inference-server/model_navigator

Triton Model Navigator is an inference toolkit designed for optimizing and...

52
Established
9 LukasHedegaard/pytorch-benchmark

Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu...

47
Emerging
10 hyperai/triton-cn

Triton Documentation in Chinese Simplified / Triton 中文文档

46
Emerging
11 srush/Tensor-Puzzles

Solve puzzles. Improve your pytorch.

46
Emerging
12 srush/Triton-Puzzles

Puzzles for learning Triton

45
Emerging
13 suvojit-0x55aa/mixed-precision-pytorch

Training with FP16 weights in PyTorch

44
Emerging
14 triton-inference-server/pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's...

43
Emerging
15 ai-dynamo/aitune

NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep...

41
Emerging
16 sachinsharma9780/Build-ML-pipelines-for-Computer-Vision-NLP-and-Graph-Neural-Networks-using-Nvidia-Triton-Server

Build ML pipelines for Computer Vision, NLP and Graph Neural Networks using...

41
Emerging
17 BobMcDear/attorch

A subset of PyTorch's neural network modules, written in Python using...

41
Emerging
18 philipturner/metal-flash-attention

FlashAttention (Metal Port)

40
Emerging
19 alexzhang13/flashattention2-custom-mask

Triton implementation of FlashAttention2 that adds Custom Masks.

39
Emerging
20 tnbar/tednet

TedNet: A Pytorch Toolkit for Tensor Decomposition Networks

39
Emerging
21 kakaobrain/trident

A performance library for machine learning applications.

38
Emerging
22 anujinho/trident

Official repository for the paper TRIDENT: Transductive Decoupled...

37
Emerging
23 fversaci/cassandra-dali-plugin

Cassandra plugin for NVIDIA DALI

25
Experimental
24 dtunai/Tri-RMSNorm

Efficient kernel for RMS normalization with fused operations, includes both...

25
Experimental
25 daemyung/practice-triton

삼각형의 실전! Triton

24
Experimental
26 indri-voice/vit.triton

VIT inference in triton because, why not?

23
Experimental
27 ZrobMiloudaa/jetson-orin-matmul-analysis

🔍 Analyze CUDA matrix multiplication performance and power consumption on...

23
Experimental
28 jayeshmahapatra/triton-fastapi-docker

A repository demonstrating deploying ML models using Triton + FastAPI + Docker

23
Experimental
29 MaxLSB/flash-attn2

FlashAttention for sliding window attention in Triton (fwd + bwd pass)

23
Experimental
30 Anggipratama17/triton-accelerated-attention

🚀 Implement Triton GPU kernels for multi-head self-attention, enabling...

22
Experimental
31 jrajath94/triton-inference-kernels

Fused softmax + Flash Attention in OpenAI Triton — 50x VRAM reduction at seq_len=2048

22
Experimental
32 hiennguyen9874/triton-face-recognition

Triton face detection & recognition

21
Experimental
33 Cre4T3Tiv3/jetson-orin-matmul-analysis

Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x...

20
Experimental
34 neuro-inc/mlops-pytorch-mlflow-triton

Example of deployment Pytorch model into the Triton inference server via...

19
Experimental
35 niyazed/triton-mnist-example

MNIST inference example on NVIDIA Triton Inference Server

16
Experimental
36 dbrll/ATTN-11

Paper Tape is All You Need

14
Experimental
37 LessUp/cuflash-attn

Pure CUDA C++ FlashAttention Forward/Backward Pass with Causal Masking &...

14
Experimental
38 angelolamonaca/PyTorch-Precision-Converter

A flexible utility for converting tensor precision in PyTorch models and...

14
Experimental
39 lengstrom/flashback

A FlashAttention backwards-over-backwards ⚡🔙🔙

13
Experimental
40 kalyani-25/Reimplementation_flash-attention-from-scratch

16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official...

11
Experimental
41 JonSnow1807/Fused-LayerNorm-CUDA-Operator

High-performance CUDA implementation of LayerNorm for PyTorch achieving...

11
Experimental
42 Achiwilms/NVIDIA-Triton-Deployment-Quickstart

QuickStart for Deploying a Basic Model on the Triton Inference Server

11
Experimental
43 palapav/triton-compute-kernels

A collection of Triton compute kernels for common ML operations

11
Experimental