wln20/CSKV

[NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

14
/ 100
Experimental

This project helps machine learning engineers and researchers optimize Large Language Models (LLMs) for handling very long text inputs. It takes an existing LLM and compresses its internal memory (KV cache) without major retraining. The result is an LLM that can process much longer contexts with significantly less memory overhead, making it more efficient for demanding applications.

No commits in the last 6 months.

Use this if you are an ML engineer or researcher facing memory constraints when deploying or experimenting with LLMs on long-context tasks, and you want to reduce memory usage with minimal retraining effort.

Not ideal if you are looking for a completely training-free solution or if you need to optimize an LLM for tasks that do not involve long-context processing.

Large Language Models LLM Optimization Memory Efficiency Model Compression Long-Context AI
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

16

Forks

Language

Python

License

Last pushed

Oct 18, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/wln20/CSKV"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.