Llm Inference Serving Transformer Models
There are 18 llm inference serving models tracked. 1 score above 70 (verified tier). The highest-rated is PaddlePaddle/FastDeploy at 76/100 with 3,659 stars. 4 of the top 10 are actively maintained.
Get all 18 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-inference-serving&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on... |
|
Verified |
| 2 |
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation |
|
Established |
| 3 |
skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems... |
|
Established |
| 4 |
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone. |
|
Established |
| 5 |
AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips |
|
Emerging |
| 6 |
VectorInstitute/vector-inference
Efficient LLM inference on Slurm clusters. |
|
Emerging |
| 7 |
pytorch/torchchat
Run PyTorch LLMs locally on servers, desktop and mobile |
|
Emerging |
| 8 |
AmpereComputingAI/ampere_model_library
AML's goal is to make benchmarking of various AI architectures on Ampere... |
|
Emerging |
| 9 |
replit/ReplitLM
Inference code and configs for the ReplitLM model family |
|
Emerging |
| 10 |
snapllm/snapllm
🔥 🔥 Alternative to Ollama 🔥 🔥 multi-model <1ms LLM switching |
|
Emerging |
| 11 |
asprenger/ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable... |
|
Emerging |
| 12 |
datawhalechina/llm-deploy
大模型/LLM推理和部署理论与实践 |
|
Emerging |
| 13 |
justADeni/intel-npu-llm
A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs) |
|
Emerging |
| 14 |
ray-project/ray-llm
RayLLM - LLMs on Ray (Archived). Read README for more info. |
|
Experimental |
| 15 |
hpdps-group/ElasticMM
ElasticMM: Elastic and Efficient MLLM Serving System |
|
Experimental |
| 16 |
bentoml/transformers-nlp-service
Online Inference API for NLP Transformer models - summarization, text... |
|
Experimental |
| 17 |
lix19937/llm-deploy
AI Infra LLM infer/ tensorrt-llm/ vllm |
|
Experimental |
| 18 |
g1ibby/llm-deploy
Tool to manage ollama model on vast.ai |
|
Experimental |