KV Cache Optimization LLM Tools

Systems and frameworks for managing, compressing, and optimizing KV cache memory usage in LLM inference. Includes cache storage engines, virtual memory approaches, and persistence layers. Does NOT include general LLM caching proxies, semantic caching, or request/response deduplication tools.

There are 31 kv cache optimization tools tracked. 3 score above 50 (established tier). The highest-rated is ModelEngine-Group/unified-cache-management at 61/100 with 261 stars.

Get all 31 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kv-cache-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	ModelEngine-Group/unified-cache-management Persist and reuse KV Cache to speedup your LLM.	61	Established	261	Python
2	reloadware/reloadium Hot Reloading and Profiling for Python	56	Established	2,999	Python
3	alibaba/tair-kvcache Alibaba Cloud's high-performance KVCache system for LLM inference, with...	50	Established	96	C++
4	October2001/Awesome-KV-Cache-Compression 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).	47	Emerging	668	—
5	xcena-dev/maru High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference	41	Emerging	38	Python
6	Zefan-Cai/Awesome-LLM-KV-Cache Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.	39	Emerging	417	—
7	OnlyTerp/kvtc First open-source KVTC implementation (NVIDIA, ICLR 2026) -- 8-32x KV cache...	38	Emerging	5	Python
8	samfurr/foveated_kv Importance-adaptive mixed-precision KV cache compression for LLM inference...	37	Emerging	3	Python
9	sensoris/semcache Semantic caching layer for your LLM applications. Reuse responses and reduce...	37	Emerging	94	Rust
10	dipampaul17/KVSplit Run larger LLMs with longer contexts on Apple Silicon by using...	37	Emerging	362	Python
11	jjiantong/Awesome-KV-Cache-Optimization [Survey] Towards Efficient Large Language Model Serving: A Survey on...	34	Emerging	310	Python
12	TreeAI-Lab/Awesome-KV-Cache-Management This repository serves as a comprehensive survey of LLM development,...	33	Emerging	291	—
13	TheToughCrane/nano-kvllm This project aims to provide a high effective KV cache manage framework for...	29	Experimental	35	Python
14	Naveenub/quantum-pulse Extreme-density data vault for LLM training sets. MsgPack + Zstd-L22 +...	25	Experimental	3	Python
15	helgklaizar/turboquant_mlx Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon...	25	Experimental	15	Python
16	jandhyala-dev/modelai-llama.cpp Production fork of llama.cpp adding KV cache compaction via Attention Matching	23	Experimental	1	C++
17	raymond-UI/llm-cache LLM request/response caching with tiered TTL, time travel, and request...	23	Experimental	1	TypeScript
18	RemizovDenis/turboquant TurboQuant: KV-cache compression for faster and cheaper LLM inference.	23	Experimental	1	HTML
19	philtimmes/KeSSie KeSSie HUGE Context Semantic recall for Large Language Models	23	Experimental	6	Python
20	rizwan199811/neurocache Reduce LLM API costs and speed up responses by caching completions with...	22	Experimental	—	TypeScript
21	sentinelXVI/KeSSie Enable efficient LLM inference by managing large token histories with a...	22	Experimental	—	Python
22	Jamalianpour/semantic-llm-cache Semantic caching for LLM API responses in Spring Boot applications	18	Experimental	3	Java
23	DreamSoul-AI/OBCache OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference	17	Experimental	—	Python
24	Siddhant-K-code/tokenvm TokenVM is a high-performance runtime that treats LLM KV cache and...	16	Experimental	9	Go
25	Janghyun1230/FastKVzip Accurate and fast KV cache compression with a gating mechanism	16	Experimental	13	Python
26	MSNP1381/cache-cool 🌟 Cache-cool: A fast, flexible LLM caching proxy that reduces latency and...	16	Experimental	29	Python
27	Resk-Security/resk-caching Resk-Caching is a Bun-based backend library and server designed for secure...	15	Experimental	7	HTML
28	wenzyxx00/LMCache Provide fast, memory-efficient caching for language models to improve...	14	Experimental	—	—
29	kushagrasri1412/PYROCACHE AI-Augmented In-Memory Cache Engine — Redis-compatible server built from...	14	Experimental	—	Python
30	hupe1980/go-llmcache 🧠 Cache implementation for storing and retrieving results of language model...	13	Experimental	7	Go
31	YUECHE77/SustainableKV This is the official implementations for SustainableKV	11	Experimental	—	Python

Comparisons in this category

Awesome-KV-Cache-Compression and Awesome-LLM-KV-Cache (47 vs 39) Awesome-LLM-KV-Cache and Awesome-KV-Cache-Management (39 vs 33)