Llm Inference Serving Transformer Models

There are 18 llm inference serving models tracked. 1 score above 70 (verified tier). The highest-rated is PaddlePaddle/FastDeploy at 76/100 with 3,659 stars. 4 of the top 10 are actively maintained.

Get all 18 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-inference-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on...

76
Verified
2 mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

65
Established
3 skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems...

60
Established
4 ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

53
Established
5 AXERA-TECH/ax-llm

Explore LLM model deployment based on AXera's AI chips

49
Emerging
6 VectorInstitute/vector-inference

Efficient LLM inference on Slurm clusters.

45
Emerging
7 pytorch/torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

44
Emerging
8 AmpereComputingAI/ampere_model_library

AML's goal is to make benchmarking of various AI architectures on Ampere...

42
Emerging
9 replit/ReplitLM

Inference code and configs for the ReplitLM model family

39
Emerging
10 snapllm/snapllm

🔥 🔥 Alternative to Ollama 🔥 🔥 multi-model <1ms LLM switching

35
Emerging
11 asprenger/ray_vllm_inference

A simple service that integrates vLLM with Ray Serve for fast and scalable...

32
Emerging
12 datawhalechina/llm-deploy

大模型/LLM推理和部署理论与实践

32
Emerging
13 justADeni/intel-npu-llm

A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs)

30
Emerging
14 ray-project/ray-llm

RayLLM - LLMs on Ray (Archived). Read README for more info.

29
Experimental
15 hpdps-group/ElasticMM

ElasticMM: Elastic and Efficient MLLM Serving System

27
Experimental
16 bentoml/transformers-nlp-service

Online Inference API for NLP Transformer models - summarization, text...

24
Experimental
17 lix19937/llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

24
Experimental
18 g1ibby/llm-deploy

Tool to manage ollama model on vast.ai

20
Experimental