kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Implements a disaggregated KVCache architecture that decouples prefill and decode stages via efficient cross-device/cross-machine transfer, with components like the Transfer Engine (RDMA-optimized KV cache movement) and Mooncake Store (hierarchical distributed cache pool). Integrates with vLLM, SGLang, TensorRT-LLM, and LMDeploy as a backend connector for multi-node inference pipelines, enabling zero-copy embeddings sharing and dynamic KV cache offloading across GPU, host, and remote storage tiers.
4,911 stars. Actively maintained with 119 commits in the last 30 days.
Stars
4,911
Forks
600
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
119
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/kvcache-ai/Mooncake"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
SemiAnalysisAI/InferenceX
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...
sophgo/tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
uccl-project/uccl
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.