Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of đź“™Awesome LLM KV Cache Papers with Codes.
Organizes research papers and implementations across nine specialized KV cache optimization categories—including compression, quantization, low-rank decomposition, and cross-layer utilization—enabling developers to track state-of-the-art inference acceleration techniques. Papers are mapped to their official implementations from research teams at DeepSeek, Microsoft, and others, with implementation links and recommendation ratings. The collection spans foundational work like StreamingLLM through recent advances in sparse attention and disaggregated serving architectures, targeting LLM inference optimization across various hardware and deployment scenarios.
417 stars. No commits in the last 6 months.
Stars
417
Forks
26
Language
—
License
GPL-3.0
Category
Last pushed
Mar 03, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Zefan-Cai/Awesome-LLM-KV-Cache"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM.
reloadware/reloadium
Hot Reloading and Profiling for Python
alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
xcena-dev/maru
High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference