KV Cache Optimization LLM Tools
Systems and frameworks for managing, compressing, and optimizing KV cache memory usage in LLM inference. Includes cache storage engines, virtual memory approaches, and persistence layers. Does NOT include general LLM caching proxies, semantic caching, or request/response deduplication tools.
There are 31 kv cache optimization tools tracked. 3 score above 50 (established tier). The highest-rated is ModelEngine-Group/unified-cache-management at 61/100 with 261 stars.
Get all 31 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kv-cache-optimization&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM. |
|
Established |
| 2 |
reloadware/reloadium
Hot Reloading and Profiling for Python |
|
Established |
| 3 |
alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with... |
|
Established |
| 4 |
October2001/Awesome-KV-Cache-Compression
๐ฐ Must-read papers on KV Cache Compression (constantly updating ๐ค). |
|
Emerging |
| 5 |
xcena-dev/maru
High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference |
|
Emerging |
| 6 |
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of ๐Awesome LLM KV Cache Papers with Codes. |
|
Emerging |
| 7 |
OnlyTerp/kvtc
First open-source KVTC implementation (NVIDIA, ICLR 2026) -- 8-32x KV cache... |
|
Emerging |
| 8 |
samfurr/foveated_kv
Importance-adaptive mixed-precision KV cache compression for LLM inference... |
|
Emerging |
| 9 |
sensoris/semcache
Semantic caching layer for your LLM applications. Reuse responses and reduce... |
|
Emerging |
| 10 |
dipampaul17/KVSplit
Run larger LLMs with longer contexts on Apple Silicon by using... |
|
Emerging |
| 11 |
jjiantong/Awesome-KV-Cache-Optimization
[Survey] Towards Efficient Large Language Model Serving: A Survey on... |
|
Emerging |
| 12 |
TreeAI-Lab/Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development,... |
|
Emerging |
| 13 |
TheToughCrane/nano-kvllm
This project aims to provide a high effective KV cache manage framework for... |
|
Experimental |
| 14 |
Naveenub/quantum-pulse
Extreme-density data vault for LLM training sets. MsgPack + Zstd-L22 +... |
|
Experimental |
| 15 |
helgklaizar/turboquant_mlx
Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon... |
|
Experimental |
| 16 |
jandhyala-dev/modelai-llama.cpp
Production fork of llama.cpp adding KV cache compaction via Attention Matching |
|
Experimental |
| 17 |
raymond-UI/llm-cache
LLM request/response caching with tiered TTL, time travel, and request... |
|
Experimental |
| 18 |
RemizovDenis/turboquant
TurboQuant: KV-cache compression for faster and cheaper LLM inference. |
|
Experimental |
| 19 |
philtimmes/KeSSie
KeSSie HUGE Context Semantic recall for Large Language Models |
|
Experimental |
| 20 |
rizwan199811/neurocache
Reduce LLM API costs and speed up responses by caching completions with... |
|
Experimental |
| 21 |
sentinelXVI/KeSSie
Enable efficient LLM inference by managing large token histories with a... |
|
Experimental |
| 22 |
Jamalianpour/semantic-llm-cache
Semantic caching for LLM API responses in Spring Boot applications |
|
Experimental |
| 23 |
DreamSoul-AI/OBCache
OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference |
|
Experimental |
| 24 |
Siddhant-K-code/tokenvm
TokenVM is a high-performance runtime that treats LLM KV cache and... |
|
Experimental |
| 25 |
Janghyun1230/FastKVzip
Accurate and fast KV cache compression with a gating mechanism |
|
Experimental |
| 26 |
MSNP1381/cache-cool
๐ Cache-cool: A fast, flexible LLM caching proxy that reduces latency and... |
|
Experimental |
| 27 |
Resk-Security/resk-caching
Resk-Caching is a Bun-based backend library and server designed for secure... |
|
Experimental |
| 28 |
wenzyxx00/LMCache
Provide fast, memory-efficient caching for language models to improve... |
|
Experimental |
| 29 |
kushagrasri1412/PYROCACHE
AI-Augmented In-Memory Cache Engine โ Redis-compatible server built from... |
|
Experimental |
| 30 |
hupe1980/go-llmcache
๐ง Cache implementation for storing and retrieving results of language model... |
|
Experimental |
| 31 |
YUECHE77/SustainableKV
This is the official implementations for SustainableKV |
|
Experimental |