Kubernetes LLM Serving LLM Tools

Tools and operators for deploying, scaling, and managing LLM inference workloads on Kubernetes clusters. Includes auto-scaling, GPU optimization, and production orchestration. Does NOT include general LLM SDKs, multi-provider abstractions, or non-Kubernetes deployment platforms.

There are 60 kubernetes llm serving tools tracked. 1 score above 70 (verified tier). The highest-rated is AlexsJones/llmfit at 72/100 with 15,685 stars and 4,091 monthly downloads. 2 of the top 10 are actively maintained.

Get all 60 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kubernetes-llm-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 AlexsJones/llmfit

Hundreds of models & providers. One command to find what runs on your hardware.

72
Verified
2 livehl/aimirror

๐Ÿš€ 200ๅ€้€Ÿ๏ผAIๆ—ถไปฃ็š„ไธ‹่ฝฝ็ฅžๅ™จ | Docker/PyPI/HuggingFace/CRAN ๅ…จๅŠ ้€Ÿ | ๅนถ่กŒๅˆ†็‰‡+ๆ™บ่ƒฝ็ผ“ๅญ˜๏ผŒ่ฎฉไธ‹่ฝฝ้ฃž่ตทๆฅ

64
Established
3 Chen-zexi/vllm-cli

A command-line interface tool for serving LLM using vLLM.

56
Established
4 ptimizeroracle/ondine

The LLM Dataset Engine โ€” batch process millions of rows with 100+ providers....

54
Established
5 victordibia/llmx

An API for Chat Fine-Tuned Large Language Models (llm)

52
Established
6 TakatoHonda/sui-lang

็ฒ‹ (Sui) - A programming language optimized for LLM code generation

50
Established
7 matrixhub-ai/matrixhub

An Open-source, self-hosted AI model hub with Hugging Face compatibility,...

49
Emerging
8 InftyAI/llmaz

โ˜ธ๏ธ Easy, advanced inference platform for large language models on...

48
Emerging
9 r2d4/openlm

OpenAI-compatible Python client that can call any LLM

47
Emerging
10 ventz/easy-llms

Easy "1-line" calling of all LLMs from OpenAI, MS Azure, AWS Bedrock, GCP...

45
Emerging
11 cloud-apim/otoroshi-llm-extension

Connect, setup, secure and seamlessly manage LLM models using an...

43
Emerging
12 llmariner/llmariner

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

43
Emerging
13 edwardcapriolo/deliverance

A Java based inference engine

42
Emerging
14 AntSeed/antseed

AntSeed P2P AI Services Network

41
Emerging
15 kalavai-net/kalavai-client

Aggregates compute from spare GPU capacity

40
Emerging
16 jadnohra/hf-providers

Compare API providers, local GPUs, and cloud for any model

37
Emerging
17 sozercan/kubectl-ai

โœจ Kubectl plugin to create manifests with LLMs

36
Emerging
18 hkalbertkim/KORA

An Inference Operating System that reduces unnecessary LLM calls by...

36
Emerging
19 chigwell/llm7.io

LLM7.io offers a single API gateway that connects you to a wide array of...

35
Emerging
20 EM-GeekLab/LLMOne

Enterprise-grade LLM automated deployment tool that makes AI servers truly...

34
Emerging
21 chenhunghan/ialacol

๐Ÿชถ Lightweight OpenAI drop-in replacement for Kubernetes

34
Emerging
22 paolobietolini/gtm-api-for-llms

This repository contains a structured, machine-readable reference of the...

32
Emerging
23 friendliai/friendli-client

[โ›”๏ธ DEPRECATED] Friendli: the fastest serving engine for generative AI

30
Emerging
24 sanjbh/kube-scaling-agent

Kubernetes operator that uses LLM reasoning to autoscale deployments โ€” reads...

29
Experimental
25 profullstack/infernet-protocol

Infernet: A Peer-to-Peer Distributed GPU Inference Protocol

28
Experimental
26 hwclass/docktor

AI-Native Autoscaler for Docker Compose built with cagent + MCP + Model Runner.

27
Experimental
27 sozercan/k8s-distributed-inference

๐Ÿฆ„ Distributed Inference on Kubernetes with DRA and MIG

27
Experimental
28 windsnow1025/LLM-Bridge

A Python library that wraps multiple LLM providers into a consistent API...

26
Experimental
29 inferLean/inferlean-project

the copilot for LLM inference optimization

25
Experimental
30 eren23/crucible

Autonomous ML research on rental GPUs โ€” LLM-driven hypothesis generation and...

25
Experimental
31 depadeto/detoserve

Open-source multi-cluster AI inference platform. Define functions once,...

24
Experimental
32 cloudglue/cloudglue-api-spec

Official OpenAPI specification for Clouglue API

24
Experimental
33 cloud-ai-ufcg/ai-engine

Workload migration recommendations engine. (CLI | API)

23
Experimental
34 bsilverthorn/vernac

Plain language programming language ๐Ÿ“–

23
Experimental
35 saurabhknp/air-gapped

Enable offline Kubernetes ops with a local AI agent that runs fully...

23
Experimental
36 TrentPierce/Shard

Shard is a speculative inference accelerator that reduces GPU usage by...

23
Experimental
37 ngstcf/llmbase

Unified API for multiple LLM providers. Use as a Python library or HTTP API server.

23
Experimental
38 inferscale/inferscale

A fully automated MLOps platform built to democratize AI/ML infrastructure

23
Experimental
39 anmolg1997/Multi-LoRA-Serve

Multi-adapter inference gateway โ€” one base model, many LoRA adapters...

22
Experimental
40 kube-gopher/magma

Kubernetes Operator for AI model lifecycle automation โ€” bridging Volcano and Kthena.

22
Experimental
41 David-Martel/PC-AI

Local LLM-powered PC diagnostics and optimization framework for Windows

22
Experimental
42 kimmmmyy223/llm-batch

๐Ÿš€ Process JSON data in batches with `llm-batch`, leveraging sequential or...

22
Experimental
43 mycellm/mycellm

Distributed LLM inference across heterogeneous hardware. Pool GPUs into a...

22
Experimental
44 umoja-compute/umoja-compute

Free OpenAI-compatible infrastructure for running open LLMs on distributed...

20
Experimental
45 deepakdeo/python-llm-playbook

A unified Python interface for multiple LLM providers (OpenAI, Anthropic,...

19
Experimental
46 cloud-apim/otoroshi-llm-extension-serverless-example

An example project to use Otoroshi LLM Extension in Cloud APIM Serverless

15
Experimental
47 boufia/vllm-lan-inference

๐Ÿš€ Deliver OpenAI-compatible LLM inference on your LAN with vLLM and gateway...

15
Experimental
48 localllm-advisor/localllm-advisor

The free tool to find the best LLM for your hardware, or the best hardware...

15
Experimental
49 adityonugrohoid/gpu-autoscale-inference

Scale-to-zero GPU inference platform โ€” LLM serving on Kubernetes with...

14
Experimental
50 gowshikram/unified-llm-engine

โšก Streamline your AI integrations with a multi-provider LLM engine,...

14
Experimental
51 sakthismarther/matrixhub

๐Ÿ”— Accelerate AI inference with MatrixHub, a self-hosted model registry that...

14
Experimental
52 debarun1234/llm-model-eligibility-checker

A beautiful desktop application that analyzes your computer's specifications...

13
Experimental
53 AdieLaine/Model-Sliding

Enables the application to transition seamlessly between different OpenAI...

13
Experimental
54 Ptchwir3/Rookery

Turn any Kubernetes Cluster into a private LLM endpoint. One Helm command...

13
Experimental
55 ait-testbed/playbookgen

A CLI tool for generating AttackMate playbooks using LLMs (currently...

12
Experimental
56 kenahrens/ai-testing

Running AI Models in Kubernetes

12
Experimental
57 failfa-st/simplif-ai

A pseudolanguage to describe code for LLMs

12
Experimental
58 ai-art-dev99/vLLM-efficient-serving-stack

Production-grade vLLM serving with an OpenAI-compatible API, per-request...

11
Experimental
59 Rohit2sali/vllm-multi-tenant-llm-gateway

This is vllm multi tenant large language model gateway. This system is...

11
Experimental
60 kodlan/LLM-zero-downtime-update

Kubernetes (Argo Rollouts) implementation for zero-downtime model updates...

11
Experimental

Comparisons in this category