LMCache and llm_efficiency
LMCache provides a production-ready KV cache optimization system for any LLM, while llm_efficiency is an educational implementation demonstrating KV cache concepts within a minimal GPT architecture—making them educational reference vs. practical tool rather than true competitors or complements.
About LMCache
LMCache/LMCache
Supercharge Your LLM with the Fastest KV Cache Layer
Distributes KV cache storage across GPU, CPU, disk, and S3 using techniques like zero-copy transfers and GPU direct storage, enabling reuse of any repeated text sequences across multiple serving instances. Integrates with vLLM and SGLang to reduce time-to-first-token and increase throughput by 3-10x, particularly for long-context workloads like multi-round QA and RAG applications. Supports disaggregated prefill and peer-to-peer cache sharing for datacenter-wide optimization.
About llm_efficiency
dataflowr/llm_efficiency
KV Cache & LoRA for minGPT
Scores updated daily from GitHub, PyPI, and npm data. How scores work