LMCache and llm_efficiency

LMCache provides a production-ready KV cache optimization system for any LLM, while llm_efficiency is an educational implementation demonstrating KV cache concepts within a minimal GPT architecture—making them educational reference vs. practical tool rather than true competitors or complements.

LMCache

Verified

llm_efficiency

Emerging

Maintenance 25/25

Adoption 20/25

Maturity 25/25

Community 22/25

Maintenance 10/25

Adoption 8/25

Maturity 11/25

Community 12/25

Stars: 7,664

Forks: 1,009

Downloads: 170,335

Commits (30d): 131

Language: Python

License: Apache-2.0

Stars: 59

Forks: 7

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No risk flags

No Package No Dependents

About LMCache

LMCache/LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Distributes KV cache storage across GPU, CPU, disk, and S3 using techniques like zero-copy transfers and GPU direct storage, enabling reuse of any repeated text sequences across multiple serving instances. Integrates with vLLM and SGLang to reduce time-to-first-token and increase throughput by 3-10x, particularly for long-context workloads like multi-round QA and RAG applications. Supports disaggregated prefill and peer-to-peer cache sharing for datacenter-wide optimization.

About llm_efficiency

dataflowr/llm_efficiency

KV Cache & LoRA for minGPT

Scores updated daily from GitHub, PyPI, and npm data. How scores work