Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

/ 100

Emerging

Organizes research papers and implementations across nine specialized KV cache optimization categories—including compression, quantization, low-rank decomposition, and cross-layer utilization—enabling developers to track state-of-the-art inference acceleration techniques. Papers are mapped to their official implementations from research teams at DeepSeek, Microsoft, and others, with implementation links and recommendation ratings. The collection spans foundational work like StreamingLLM through recent advances in sparse attention and disaggregated serving architectures, targeting LLM inference optimization across various hardware and deployment scenarios.

417 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

417

Forks

Language

—

License

GPL-3.0

Compare

Awesome-LLM-KV-Cache and Awesome-KV-Cache-Compression Awesome-LLM-KV-Cache and Awesome-KV-Cache-Management

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...

October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

xcena-dev/maru

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

Explore LLM Tools

All categories Trending LLM Tool directory Insights