Awesome-KV-Cache-Compression and Awesome-LLM-KV-Cache
These are complements that serve different aspects of the same problem space: one curates papers specifically focused on KV cache compression techniques, while the other provides a broader collection of KV cache research papers with corresponding implementations, allowing researchers to explore both specialized compression methods and the wider landscape of KV cache optimizations together.
About Awesome-KV-Cache-Compression
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Curates papers and implementations spanning pruning, quantization, and distillation approaches for reducing KV cache memory consumption in LLMs, with links to referenced codebases like kvpress and KVCache-Factory. Organizes methods by technique (sparse attention, token eviction, low-rank decomposition) and includes recent survey papers covering KV cache optimization strategies across inference frameworks. Integrates with Hugging Face transformers ecosystem and tracks active research implementations with GitHub repository references.
About Awesome-LLM-KV-Cache
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Organizes research papers and implementations across nine specialized KV cache optimization categories—including compression, quantization, low-rank decomposition, and cross-layer utilization—enabling developers to track state-of-the-art inference acceleration techniques. Papers are mapped to their official implementations from research teams at DeepSeek, Microsoft, and others, with implementation links and recommendation ratings. The collection spans foundational work like StreamingLLM through recent advances in sparse attention and disaggregated serving architectures, targeting LLM inference optimization across various hardware and deployment scenarios.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work