kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

/ 100

Verified

Implements a disaggregated KVCache architecture that decouples prefill and decode stages via efficient cross-device/cross-machine transfer, with components like the Transfer Engine (RDMA-optimized KV cache movement) and Mooncake Store (hierarchical distributed cache pool). Integrates with vLLM, SGLang, TensorRT-LLM, and LMDeploy as a backend connector for multi-node inference pipelines, enabling zero-copy embeddings sharing and dynamic KV cache offloading across GPU, host, and remote storage tiers.

4,911 stars. Actively maintained with 119 commits in the last 30 days.

No Package No Dependents

Maintenance 25 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

4,911

Forks

600

Language

C++

License

Apache-2.0

Related tools

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...

sophgo/tpu-mlir

Machine learning compiler based on MLIR for Sophgo TPU.

uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...

BBuf/how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Explore LLM Tools

All categories Trending LLM Tool directory Insights