triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

69
/ 100
Established

Supports concurrent execution of multiple models with dynamic and sequence batching, alongside model ensembles and custom preprocessing pipelines via Backend API. Offers framework-agnostic deployment across TensorRT, PyTorch, ONNX, and other backends with HTTP/REST, gRPC (KServe protocol), and in-process C/Java APIs for cloud, edge, and embedded inference. Includes implicit state management for stateful models and extensibility through Python-based custom backends for specialized inference logic.

10,426 stars. Actively maintained with 19 commits in the last 30 days.

No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 23 / 25

How are scores calculated?

Stars

10,426

Forks

1,734

Language

Python

License

BSD-3-Clause

Last pushed

Mar 13, 2026

Commits (30d)

19

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/triton-inference-server/server"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.