Kubernetes LLM Serving LLM Tools
Tools and operators for deploying, scaling, and managing LLM inference workloads on Kubernetes clusters. Includes auto-scaling, GPU optimization, and production orchestration. Does NOT include general LLM SDKs, multi-provider abstractions, or non-Kubernetes deployment platforms.
There are 60 kubernetes llm serving tools tracked. 1 score above 70 (verified tier). The highest-rated is AlexsJones/llmfit at 72/100 with 15,685 stars and 4,091 monthly downloads. 2 of the top 10 are actively maintained.
Get all 60 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kubernetes-llm-serving&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
AlexsJones/llmfit
Hundreds of models & providers. One command to find what runs on your hardware. |
|
Verified |
| 2 |
livehl/aimirror
๐ 200ๅ้๏ผAIๆถไปฃ็ไธ่ฝฝ็ฅๅจ | Docker/PyPI/HuggingFace/CRAN ๅ จๅ ้ | ๅนถ่กๅ็+ๆบ่ฝ็ผๅญ๏ผ่ฎฉไธ่ฝฝ้ฃ่ตทๆฅ |
|
Established |
| 3 |
Chen-zexi/vllm-cli
A command-line interface tool for serving LLM using vLLM. |
|
Established |
| 4 |
ptimizeroracle/ondine
The LLM Dataset Engine โ batch process millions of rows with 100+ providers.... |
|
Established |
| 5 |
victordibia/llmx
An API for Chat Fine-Tuned Large Language Models (llm) |
|
Established |
| 6 |
TakatoHonda/sui-lang
็ฒ (Sui) - A programming language optimized for LLM code generation |
|
Established |
| 7 |
matrixhub-ai/matrixhub
An Open-source, self-hosted AI model hub with Hugging Face compatibility,... |
|
Emerging |
| 8 |
InftyAI/llmaz
โธ๏ธ Easy, advanced inference platform for large language models on... |
|
Emerging |
| 9 |
r2d4/openlm
OpenAI-compatible Python client that can call any LLM |
|
Emerging |
| 10 |
ventz/easy-llms
Easy "1-line" calling of all LLMs from OpenAI, MS Azure, AWS Bedrock, GCP... |
|
Emerging |
| 11 |
cloud-apim/otoroshi-llm-extension
Connect, setup, secure and seamlessly manage LLM models using an... |
|
Emerging |
| 12 |
llmariner/llmariner
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs. |
|
Emerging |
| 13 |
edwardcapriolo/deliverance
A Java based inference engine |
|
Emerging |
| 14 |
AntSeed/antseed
AntSeed P2P AI Services Network |
|
Emerging |
| 15 |
kalavai-net/kalavai-client
Aggregates compute from spare GPU capacity |
|
Emerging |
| 16 |
jadnohra/hf-providers
Compare API providers, local GPUs, and cloud for any model |
|
Emerging |
| 17 |
sozercan/kubectl-ai
โจ Kubectl plugin to create manifests with LLMs |
|
Emerging |
| 18 |
hkalbertkim/KORA
An Inference Operating System that reduces unnecessary LLM calls by... |
|
Emerging |
| 19 |
chigwell/llm7.io
LLM7.io offers a single API gateway that connects you to a wide array of... |
|
Emerging |
| 20 |
EM-GeekLab/LLMOne
Enterprise-grade LLM automated deployment tool that makes AI servers truly... |
|
Emerging |
| 21 |
chenhunghan/ialacol
๐ชถ Lightweight OpenAI drop-in replacement for Kubernetes |
|
Emerging |
| 22 |
paolobietolini/gtm-api-for-llms
This repository contains a structured, machine-readable reference of the... |
|
Emerging |
| 23 |
friendliai/friendli-client
[โ๏ธ DEPRECATED] Friendli: the fastest serving engine for generative AI |
|
Emerging |
| 24 |
sanjbh/kube-scaling-agent
Kubernetes operator that uses LLM reasoning to autoscale deployments โ reads... |
|
Experimental |
| 25 |
profullstack/infernet-protocol
Infernet: A Peer-to-Peer Distributed GPU Inference Protocol |
|
Experimental |
| 26 |
hwclass/docktor
AI-Native Autoscaler for Docker Compose built with cagent + MCP + Model Runner. |
|
Experimental |
| 27 |
sozercan/k8s-distributed-inference
๐ฆ Distributed Inference on Kubernetes with DRA and MIG |
|
Experimental |
| 28 |
windsnow1025/LLM-Bridge
A Python library that wraps multiple LLM providers into a consistent API... |
|
Experimental |
| 29 |
inferLean/inferlean-project
the copilot for LLM inference optimization |
|
Experimental |
| 30 |
eren23/crucible
Autonomous ML research on rental GPUs โ LLM-driven hypothesis generation and... |
|
Experimental |
| 31 |
depadeto/detoserve
Open-source multi-cluster AI inference platform. Define functions once,... |
|
Experimental |
| 32 |
cloudglue/cloudglue-api-spec
Official OpenAPI specification for Clouglue API |
|
Experimental |
| 33 |
cloud-ai-ufcg/ai-engine
Workload migration recommendations engine. (CLI | API) |
|
Experimental |
| 34 |
bsilverthorn/vernac
Plain language programming language ๐ |
|
Experimental |
| 35 |
saurabhknp/air-gapped
Enable offline Kubernetes ops with a local AI agent that runs fully... |
|
Experimental |
| 36 |
TrentPierce/Shard
Shard is a speculative inference accelerator that reduces GPU usage by... |
|
Experimental |
| 37 |
ngstcf/llmbase
Unified API for multiple LLM providers. Use as a Python library or HTTP API server. |
|
Experimental |
| 38 |
inferscale/inferscale
A fully automated MLOps platform built to democratize AI/ML infrastructure |
|
Experimental |
| 39 |
anmolg1997/Multi-LoRA-Serve
Multi-adapter inference gateway โ one base model, many LoRA adapters... |
|
Experimental |
| 40 |
kube-gopher/magma
Kubernetes Operator for AI model lifecycle automation โ bridging Volcano and Kthena. |
|
Experimental |
| 41 |
David-Martel/PC-AI
Local LLM-powered PC diagnostics and optimization framework for Windows |
|
Experimental |
| 42 |
kimmmmyy223/llm-batch
๐ Process JSON data in batches with `llm-batch`, leveraging sequential or... |
|
Experimental |
| 43 |
mycellm/mycellm
Distributed LLM inference across heterogeneous hardware. Pool GPUs into a... |
|
Experimental |
| 44 |
umoja-compute/umoja-compute
Free OpenAI-compatible infrastructure for running open LLMs on distributed... |
|
Experimental |
| 45 |
deepakdeo/python-llm-playbook
A unified Python interface for multiple LLM providers (OpenAI, Anthropic,... |
|
Experimental |
| 46 |
cloud-apim/otoroshi-llm-extension-serverless-example
An example project to use Otoroshi LLM Extension in Cloud APIM Serverless |
|
Experimental |
| 47 |
boufia/vllm-lan-inference
๐ Deliver OpenAI-compatible LLM inference on your LAN with vLLM and gateway... |
|
Experimental |
| 48 |
localllm-advisor/localllm-advisor
The free tool to find the best LLM for your hardware, or the best hardware... |
|
Experimental |
| 49 |
adityonugrohoid/gpu-autoscale-inference
Scale-to-zero GPU inference platform โ LLM serving on Kubernetes with... |
|
Experimental |
| 50 |
gowshikram/unified-llm-engine
โก Streamline your AI integrations with a multi-provider LLM engine,... |
|
Experimental |
| 51 |
sakthismarther/matrixhub
๐ Accelerate AI inference with MatrixHub, a self-hosted model registry that... |
|
Experimental |
| 52 |
debarun1234/llm-model-eligibility-checker
A beautiful desktop application that analyzes your computer's specifications... |
|
Experimental |
| 53 |
AdieLaine/Model-Sliding
Enables the application to transition seamlessly between different OpenAI... |
|
Experimental |
| 54 |
Ptchwir3/Rookery
Turn any Kubernetes Cluster into a private LLM endpoint. One Helm command... |
|
Experimental |
| 55 |
ait-testbed/playbookgen
A CLI tool for generating AttackMate playbooks using LLMs (currently... |
|
Experimental |
| 56 |
kenahrens/ai-testing
Running AI Models in Kubernetes |
|
Experimental |
| 57 |
failfa-st/simplif-ai
A pseudolanguage to describe code for LLMs |
|
Experimental |
| 58 |
ai-art-dev99/vLLM-efficient-serving-stack
Production-grade vLLM serving with an OpenAI-compatible API, per-request... |
|
Experimental |
| 59 |
Rohit2sali/vllm-multi-tenant-llm-gateway
This is vllm multi tenant large language model gateway. This system is... |
|
Experimental |
| 60 |
kodlan/LLM-zero-downtime-update
Kubernetes (Argo Rollouts) implementation for zero-downtime model updates... |
|
Experimental |