October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

/ 100

Emerging

Curates papers and implementations spanning pruning, quantization, and distillation approaches for reducing KV cache memory consumption in LLMs, with links to referenced codebases like kvpress and KVCache-Factory. Organizes methods by technique (sparse attention, token eviction, low-rank decomposition) and includes recent survey papers covering KV cache optimization strategies across inference frameworks. Integrates with Hugging Face transformers ecosystem and tracks active research implementations with GitHub repository references.

668 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

668

Forks

Language

—

License

MIT

Compare

Awesome-KV-Cache-Compression and Awesome-LLM-KV-Cache

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...

xcena-dev/maru

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Explore LLM Tools

All categories Trending LLM Tool directory Insights