KV Cache Optimization LLM Tools

Systems and frameworks for managing, compressing, and optimizing KV cache memory usage in LLM inference. Includes cache storage engines, virtual memory approaches, and persistence layers. Does NOT include general LLM caching proxies, semantic caching, or request/response deduplication tools.

There are 31 kv cache optimization tools tracked. 3 score above 50 (established tier). The highest-rated is ModelEngine-Group/unified-cache-management at 61/100 with 261 stars.

Get all 31 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kv-cache-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

61
Established
2 reloadware/reloadium

Hot Reloading and Profiling for Python

56
Established
3 alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with...

50
Established
4 October2001/Awesome-KV-Cache-Compression

๐Ÿ“ฐ Must-read papers on KV Cache Compression (constantly updating ๐Ÿค—).

47
Emerging
5 xcena-dev/maru

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

41
Emerging
6 Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of ๐Ÿ“™Awesome LLM KV Cache Papers with Codes.

39
Emerging
7 OnlyTerp/kvtc

First open-source KVTC implementation (NVIDIA, ICLR 2026) -- 8-32x KV cache...

38
Emerging
8 samfurr/foveated_kv

Importance-adaptive mixed-precision KV cache compression for LLM inference...

37
Emerging
9 sensoris/semcache

Semantic caching layer for your LLM applications. Reuse responses and reduce...

37
Emerging
10 dipampaul17/KVSplit

Run larger LLMs with longer contexts on Apple Silicon by using...

37
Emerging
11 jjiantong/Awesome-KV-Cache-Optimization

[Survey] Towards Efficient Large Language Model Serving: A Survey on...

34
Emerging
12 TreeAI-Lab/Awesome-KV-Cache-Management

This repository serves as a comprehensive survey of LLM development,...

33
Emerging
13 TheToughCrane/nano-kvllm

This project aims to provide a high effective KV cache manage framework for...

29
Experimental
14 Naveenub/quantum-pulse

Extreme-density data vault for LLM training sets. MsgPack + Zstd-L22 +...

25
Experimental
15 helgklaizar/turboquant_mlx

Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon...

25
Experimental
16 jandhyala-dev/modelai-llama.cpp

Production fork of llama.cpp adding KV cache compaction via Attention Matching

23
Experimental
17 raymond-UI/llm-cache

LLM request/response caching with tiered TTL, time travel, and request...

23
Experimental
18 RemizovDenis/turboquant

TurboQuant: KV-cache compression for faster and cheaper LLM inference.

23
Experimental
19 philtimmes/KeSSie

KeSSie HUGE Context Semantic recall for Large Language Models

23
Experimental
20 rizwan199811/neurocache

Reduce LLM API costs and speed up responses by caching completions with...

22
Experimental
21 sentinelXVI/KeSSie

Enable efficient LLM inference by managing large token histories with a...

22
Experimental
22 Jamalianpour/semantic-llm-cache

Semantic caching for LLM API responses in Spring Boot applications

18
Experimental
23 DreamSoul-AI/OBCache

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

17
Experimental
24 Siddhant-K-code/tokenvm

TokenVM is a high-performance runtime that treats LLM KV cache and...

16
Experimental
25 Janghyun1230/FastKVzip

Accurate and fast KV cache compression with a gating mechanism

16
Experimental
26 MSNP1381/cache-cool

๐ŸŒŸ Cache-cool: A fast, flexible LLM caching proxy that reduces latency and...

16
Experimental
27 Resk-Security/resk-caching

Resk-Caching is a Bun-based backend library and server designed for secure...

15
Experimental
28 wenzyxx00/LMCache

Provide fast, memory-efficient caching for language models to improve...

14
Experimental
29 kushagrasri1412/PYROCACHE

AI-Augmented In-Memory Cache Engine โ€” Redis-compatible server built from...

14
Experimental
30 hupe1980/go-llmcache

๐Ÿง  Cache implementation for storing and retrieving results of language model...

13
Experimental
31 YUECHE77/SustainableKV

This is the official implementations for SustainableKV

11
Experimental