wln20/CSKV

[NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

/ 100

Experimental

This project helps machine learning engineers and researchers optimize Large Language Models (LLMs) for handling very long text inputs. It takes an existing LLM and compresses its internal memory (KV cache) without major retraining. The result is an LLM that can process much longer contexts with significantly less memory overhead, making it more efficient for demanding applications.

No commits in the last 6 months.

Use this if you are an ML engineer or researcher facing memory constraints when deploying or experimenting with LLMs on long-context tasks, and you want to reduce memory usage with minimal retraining effort.

Not ideal if you are looking for a completely training-free solution or if you need to optimize an LLM for tasks that do not involve long-context processing.

Large Language Models LLM Optimization Memory Efficiency Model Compression Long-Context AI

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Explore LLM Tools

All categories Trending LLM Tool directory Insights