Llm Inference Serving Transformer Models

There are 18 llm inference serving models tracked. 1 score above 70 (verified tier). The highest-rated is PaddlePaddle/FastDeploy at 76/100 with 3,659 stars. 4 of the top 10 are actively maintained.

Get all 18 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-inference-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	PaddlePaddle/FastDeploy High-performance Inference and Deployment Toolkit for LLMs and VLMs based on...	76	Verified	3,659	Python
2	mlc-ai/mlc-llm Universal LLM Deployment Engine with ML Compilation	65	Established	22,185	Python
3	skyzh/tiny-llm A course of learning LLM inference serving on Apple Silicon for systems...	60	Established	3,935	Python
4	ServerlessLLM/ServerlessLLM Serverless LLM Serving for Everyone.	53	Established	663	Python
5	AXERA-TECH/ax-llm Explore LLM model deployment based on AXera's AI chips	49	Emerging	142	C++
6	VectorInstitute/vector-inference Efficient LLM inference on Slurm clusters.	45	Emerging	95	Python
7	pytorch/torchchat Run PyTorch LLMs locally on servers, desktop and mobile	44	Emerging	3,625	Python
8	AmpereComputingAI/ampere_model_library AML's goal is to make benchmarking of various AI architectures on Ampere...	42	Emerging	23	Python
9	replit/ReplitLM Inference code and configs for the ReplitLM model family	39	Emerging	1,042	Python
10	snapllm/snapllm 🔥 🔥 Alternative to Ollama 🔥 🔥 multi-model <1ms LLM switching	35	Emerging	32	C++
11	asprenger/ray_vllm_inference A simple service that integrates vLLM with Ray Serve for fast and scalable...	32	Emerging	78	Python
12	datawhalechina/llm-deploy 大模型/LLM推理和部署理论与实践	32	Emerging	381	—
13	justADeni/intel-npu-llm A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs)	30	Emerging	35	Python
14	ray-project/ray-llm RayLLM - LLMs on Ray (Archived). Read README for more info.	29	Experimental	1,267	—
15	hpdps-group/ElasticMM ElasticMM: Elastic and Efficient MLLM Serving System	27	Experimental	41	Python
16	bentoml/transformers-nlp-service Online Inference API for NLP Transformer models - summarization, text...	24	Experimental	45	Python
17	lix19937/llm-deploy AI Infra LLM infer/ tensorrt-llm/ vllm	24	Experimental	22	Python
18	g1ibby/llm-deploy Tool to manage ollama model on vast.ai	20	Experimental	19	Python

Comparisons in this category

mlc-llm and llm-deploy (65 vs 24) FastDeploy and llm-deploy (76 vs 24) ray_vllm_inference and ray-llm (32 vs 29)