LMCache and llm_efficiency

LMCache provides a production-ready KV cache optimization system for any LLM, while llm_efficiency is an educational implementation demonstrating KV cache concepts within a minimal GPT architecture—making them educational reference vs. practical tool rather than true competitors or complements.

LMCache
92
Verified
llm_efficiency
41
Emerging
Maintenance 25/25
Adoption 20/25
Maturity 25/25
Community 22/25
Maintenance 10/25
Adoption 8/25
Maturity 11/25
Community 12/25
Stars: 7,664
Forks: 1,009
Downloads: 170,335
Commits (30d): 131
Language: Python
License: Apache-2.0
Stars: 59
Forks: 7
Downloads: —
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
No Package No Dependents

About LMCache

LMCache/LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Distributes KV cache storage across GPU, CPU, disk, and S3 using techniques like zero-copy transfers and GPU direct storage, enabling reuse of any repeated text sequences across multiple serving instances. Integrates with vLLM and SGLang to reduce time-to-first-token and increase throughput by 3-10x, particularly for long-context workloads like multi-round QA and RAG applications. Supports disaggregated prefill and peer-to-peer cache sharing for datacenter-wide optimization.

About llm_efficiency

dataflowr/llm_efficiency

KV Cache & LoRA for minGPT

Scores updated daily from GitHub, PyPI, and npm data. How scores work