October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Curates papers and implementations spanning pruning, quantization, and distillation approaches for reducing KV cache memory consumption in LLMs, with links to referenced codebases like kvpress and KVCache-Factory. Organizes methods by technique (sparse attention, token eviction, low-rank decomposition) and includes recent survey papers covering KV cache optimization strategies across inference frameworks. Integrates with Hugging Face transformers ecosystem and tracks active research implementations with GitHub repository references.
668 stars.
Stars
668
Forks
22
Language
—
License
MIT
Category
Last pushed
Feb 24, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/October2001/Awesome-KV-Cache-Compression"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM.
reloadware/reloadium
Hot Reloading and Profiling for Python
alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...
xcena-dev/maru
High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.