triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Supports concurrent execution of multiple models with dynamic and sequence batching, alongside model ensembles and custom preprocessing pipelines via Backend API. Offers framework-agnostic deployment across TensorRT, PyTorch, ONNX, and other backends with HTTP/REST, gRPC (KServe protocol), and in-process C/Java APIs for cloud, edge, and embedded inference. Includes implicit state management for stateful models and extensibility through Python-based custom backends for specialized inference logic.
10,426 stars. Actively maintained with 19 commits in the last 30 days.
Stars
10,426
Forks
1,734
Language
Python
License
BSD-3-Clause
Category
Last pushed
Mar 13, 2026
Commits (30d)
19
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/triton-inference-server/server"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
gpu-mode/Triton-Puzzles
Puzzles for learning Triton
hyperai/tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档
triton-inference-server/model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory...
hailo-ai/hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment