Kubernetes LLM Serving LLM Tools

Tools and operators for deploying, scaling, and managing LLM inference workloads on Kubernetes clusters. Includes auto-scaling, GPU optimization, and production orchestration. Does NOT include general LLM SDKs, multi-provider abstractions, or non-Kubernetes deployment platforms.

There are 60 kubernetes llm serving tools tracked. 1 score above 70 (verified tier). The highest-rated is AlexsJones/llmfit at 72/100 with 15,685 stars and 4,091 monthly downloads. 2 of the top 10 are actively maintained.

Get all 60 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kubernetes-llm-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	AlexsJones/llmfit Hundreds of models & providers. One command to find what runs on your hardware.	72	Verified	15,685	Rust
2	livehl/aimirror 🚀 200倍速！AI时代的下载神器 \| Docker/PyPI/HuggingFace/CRAN 全加速 \| 并行分片+智能缓存，让下载飞起来	64	Established	671	Python
3	Chen-zexi/vllm-cli A command-line interface tool for serving LLM using vLLM.	56	Established	482	Python
4	ptimizeroracle/ondine The LLM Dataset Engine — batch process millions of rows with 100+ providers....	54	Established	4	Python
5	victordibia/llmx An API for Chat Fine-Tuned Large Language Models (llm)	52	Established	92	Python
6	TakatoHonda/sui-lang 粋 (Sui) - A programming language optimized for LLM code generation	50	Established	371	Python
7	matrixhub-ai/matrixhub An Open-source, self-hosted AI model hub with Hugging Face compatibility,...	49	Emerging	58	Go
8	InftyAI/llmaz ☸️ Easy, advanced inference platform for large language models on...	48	Emerging	293	Go
9	r2d4/openlm OpenAI-compatible Python client that can call any LLM	47	Emerging	372	Python
10	ventz/easy-llms Easy "1-line" calling of all LLMs from OpenAI, MS Azure, AWS Bedrock, GCP...	45	Emerging	53	Python
11	cloud-apim/otoroshi-llm-extension Connect, setup, secure and seamlessly manage LLM models using an...	43	Emerging	15	Scala
12	llmariner/llmariner Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.	43	Emerging	94	Go
13	edwardcapriolo/deliverance A Java based inference engine	42	Emerging	11	Java
14	AntSeed/antseed AntSeed P2P AI Services Network	41	Emerging	16	TypeScript
15	kalavai-net/kalavai-client Aggregates compute from spare GPU capacity	40	Emerging	197	Python
16	jadnohra/hf-providers Compare API providers, local GPUs, and cloud for any model	37	Emerging	3	JavaScript
17	sozercan/kubectl-ai ✨ Kubectl plugin to create manifests with LLMs	36	Emerging	1,196	Go
18	hkalbertkim/KORA An Inference Operating System that reduces unnecessary LLM calls by...	36	Emerging	4	Python
19	chigwell/llm7.io LLM7.io offers a single API gateway that connects you to a wide array of...	35	Emerging	136	TypeScript
20	EM-GeekLab/LLMOne Enterprise-grade LLM automated deployment tool that makes AI servers truly...	34	Emerging	87	TypeScript
21	chenhunghan/ialacol 🪶 Lightweight OpenAI drop-in replacement for Kubernetes	34	Emerging	147	Python
22	paolobietolini/gtm-api-for-llms This repository contains a structured, machine-readable reference of the...	32	Emerging	24	—
23	friendliai/friendli-client [⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI	30	Emerging	50	Python
24	sanjbh/kube-scaling-agent Kubernetes operator that uses LLM reasoning to autoscale deployments — reads...	29	Experimental	3	Go
25	profullstack/infernet-protocol Infernet: A Peer-to-Peer Distributed GPU Inference Protocol	28	Experimental	22	JavaScript
26	hwclass/docktor AI-Native Autoscaler for Docker Compose built with cagent + MCP + Model Runner.	27	Experimental	5	Go
27	sozercan/k8s-distributed-inference 🦄 Distributed Inference on Kubernetes with DRA and MIG	27	Experimental	3	Shell
28	windsnow1025/LLM-Bridge A Python library that wraps multiple LLM providers into a consistent API...	26	Experimental	6	Python
29	inferLean/inferlean-project the copilot for LLM inference optimization	25	Experimental	3	Go
30	eren23/crucible Autonomous ML research on rental GPUs — LLM-driven hypothesis generation and...	25	Experimental	4	Python
31	depadeto/detoserve Open-source multi-cluster AI inference platform. Define functions once,...	24	Experimental	—	Go
32	cloudglue/cloudglue-api-spec Official OpenAPI specification for Clouglue API	24	Experimental	2	—
33	cloud-ai-ufcg/ai-engine Workload migration recommendations engine. (CLI \| API)	23	Experimental	1	Python
34	bsilverthorn/vernac Plain language programming language 📖	23	Experimental	6	Python
35	saurabhknp/air-gapped Enable offline Kubernetes ops with a local AI agent that runs fully...	23	Experimental	1	Shell
36	TrentPierce/Shard Shard is a speculative inference accelerator that reduces GPU usage by...	23	Experimental	1	C++
37	ngstcf/llmbase Unified API for multiple LLM providers. Use as a Python library or HTTP API server.	23	Experimental	1	Python
38	inferscale/inferscale A fully automated MLOps platform built to democratize AI/ML infrastructure	23	Experimental	1	—
39	anmolg1997/Multi-LoRA-Serve Multi-adapter inference gateway — one base model, many LoRA adapters...	22	Experimental	—	Python
40	kube-gopher/magma Kubernetes Operator for AI model lifecycle automation — bridging Volcano and Kthena.	22	Experimental	—	—
41	David-Martel/PC-AI Local LLM-powered PC diagnostics and optimization framework for Windows	22	Experimental	—	PowerShell
42	kimmmmyy223/llm-batch 🚀 Process JSON data in batches with `llm-batch`, leveraging sequential or...	22	Experimental	—	Go
43	mycellm/mycellm Distributed LLM inference across heterogeneous hardware. Pool GPUs into a...	22	Experimental	—	Python
44	umoja-compute/umoja-compute Free OpenAI-compatible infrastructure for running open LLMs on distributed...	20	Experimental	1	Jupyter Notebook
45	deepakdeo/python-llm-playbook A unified Python interface for multiple LLM providers (OpenAI, Anthropic,...	19	Experimental	—	Jupyter Notebook
46	cloud-apim/otoroshi-llm-extension-serverless-example An example project to use Otoroshi LLM Extension in Cloud APIM Serverless	15	Experimental	1	—
47	boufia/vllm-lan-inference 🚀 Deliver OpenAI-compatible LLM inference on your LAN with vLLM and gateway...	15	Experimental	1	—
48	localllm-advisor/localllm-advisor The free tool to find the best LLM for your hardware, or the best hardware...	15	Experimental	1	TypeScript
49	adityonugrohoid/gpu-autoscale-inference Scale-to-zero GPU inference platform — LLM serving on Kubernetes with...	14	Experimental	—	Shell
50	gowshikram/unified-llm-engine ⚡ Streamline your AI integrations with a multi-provider LLM engine,...	14	Experimental	—	TypeScript
51	sakthismarther/matrixhub 🔗 Accelerate AI inference with MatrixHub, a self-hosted model registry that...	14	Experimental	—	—
52	debarun1234/llm-model-eligibility-checker A beautiful desktop application that analyzes your computer's specifications...	13	Experimental	—	JavaScript
53	AdieLaine/Model-Sliding Enables the application to transition seamlessly between different OpenAI...	13	Experimental	5	Python
54	Ptchwir3/Rookery Turn any Kubernetes Cluster into a private LLM endpoint. One Helm command...	13	Experimental	2	Dockerfile
55	ait-testbed/playbookgen A CLI tool for generating AttackMate playbooks using LLMs (currently...	12	Experimental	1	Python
56	kenahrens/ai-testing Running AI Models in Kubernetes	12	Experimental	3	JavaScript
57	failfa-st/simplif-ai A pseudolanguage to describe code for LLMs	12	Experimental	4	—
58	ai-art-dev99/vLLM-efficient-serving-stack Production-grade vLLM serving with an OpenAI-compatible API, per-request...	11	Experimental	—	Python
59	Rohit2sali/vllm-multi-tenant-llm-gateway This is vllm multi tenant large language model gateway. This system is...	11	Experimental	—	Jupyter Notebook
60	kodlan/LLM-zero-downtime-update Kubernetes (Argo Rollouts) implementation for zero-downtime model updates...	11	Experimental	—	Shell

Comparisons in this category

llmfit and llm7.io (72 vs 35)