October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

47
/ 100
Emerging

Curates papers and implementations spanning pruning, quantization, and distillation approaches for reducing KV cache memory consumption in LLMs, with links to referenced codebases like kvpress and KVCache-Factory. Organizes methods by technique (sparse attention, token eviction, low-rank decomposition) and includes recent survey papers covering KV cache optimization strategies across inference frameworks. Integrates with Hugging Face transformers ecosystem and tracks active research implementations with GitHub repository references.

668 stars.

No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

668

Forks

22

Language

License

MIT

Last pushed

Feb 24, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/October2001/Awesome-KV-Cache-Compression"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.