LLM Compression Optimization LLM Tools
Tools and techniques for reducing LLM size, memory footprint, and inference latency through compression, pruning, quantization, and architectural optimization. Does NOT include general model training, fine-tuning frameworks, or inference serving infrastructure.
There are 30 llm compression optimization tools tracked. 1 score above 70 (verified tier). The highest-rated is Tencent/AngelSlim at 79/100 with 536 stars and 5,117 monthly downloads. 1 of the top 10 are actively maintained.
Get all 30 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-compression-optimization&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
Tencent/AngelSlim
Model compression toolkit engineered for enhanced usability,... |
|
Verified |
| 2 |
nebuly-ai/optimate
A collection of libraries to optimise AI model performances |
|
Emerging |
| 3 |
kyo-takano/chinchilla
A toolkit for scaling law research ⚖ |
|
Emerging |
| 4 |
liyucheng09/Selective_Context
Compress your input to ChatGPT or other LLMs, to let them process 2x more... |
|
Emerging |
| 5 |
antgroup/glake
GLake: optimizing GPU memory management and IO transmission. |
|
Emerging |
| 6 |
TsingmaoAI/MI-optimize
mi-optimize is a versatile tool designed for the quantization and evaluation... |
|
Emerging |
| 7 |
microsoft/only_train_once
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured... |
|
Experimental |
| 8 |
robtacconelli/Nacrith-GPU
Nacrith — Lossless text compression via ensemble neural arithmetic coding.... |
|
Experimental |
| 9 |
amazon-science/llm-rank-pruning
LLM-Rank: A graph theoretical approach to structured pruning of large... |
|
Experimental |
| 10 |
naskio/mergeui
All-in-one UI for merged LLMs in Hugging Face |
|
Experimental |
| 11 |
AndyyyYuuu/lm-is-compressor
An accurate language model is a high-compression, lossless data compressor |
|
Experimental |
| 12 |
LINs-lab/DeFT
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient... |
|
Experimental |
| 13 |
deadlykitten4/ERC-SVD
ERC-SVD: Error-Controlled SVD for Large Language Model Compression |
|
Experimental |
| 14 |
oliviersaidi/PACF_LLM
Pattern-aware optimization framework achieving 93.8% complexity reduction in... |
|
Experimental |
| 15 |
M9rth/heretic
🛠 Remove censorship from language models instantly using advanced... |
|
Experimental |
| 16 |
friendshipkim/overfill
Code for OverFill: Two-Stage Models for Efficient Language Model Decoding |
|
Experimental |
| 17 |
Pro-GenAI/ShortLang
Compressed Text for efficient LLMs |
|
Experimental |
| 18 |
talkking/PrunerGPT
[ICASSP2024] One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large... |
|
Experimental |
| 19 |
Yvancg/optimizers
A collection of minimal, dependency-free, performance-focused utilities for... |
|
Experimental |
| 20 |
louisbrulenaudet/mergeKit
Tools for merging pretrained Large Language Models and create Mixture of... |
|
Experimental |
| 21 |
Mikola78/trinity-large-tech-report
🚀 Explore advanced sparse Mixture-of-Experts models with up to 400B... |
|
Experimental |
| 22 |
simocolo/nnDrain
A PyTorch implementation for structural pruning applied to neural networks... |
|
Experimental |
| 23 |
plandes/lmtask
Inferencing and Training Large Language Model Tasks |
|
Experimental |
| 24 |
burcgokden/LLM-from-Power-Law-Decoder-Representations
Implementation of PLDR-LLM: Large Language Model from Power Law Decoder... |
|
Experimental |
| 25 |
0xnu/multicollinearity_llm
A multicollinearity-based compression C program, identifies and removes... |
|
Experimental |
| 26 |
chandan11248/deepseek-innovations-from-scratch
Reverse-engineering how DeepSeek achieved frontier LLM performance at a... |
|
Experimental |
| 27 |
arrmansa/Temporal-Neuron-Variance-Pruning-Demo
An implementation of Variance Pruning: Pruning Language Models via Temporal... |
|
Experimental |
| 28 |
burcgokden/PLDR-LLM-with-KVG-cache
Implementation of PLDR-LLM with KV-cache and G-cache in Pytorch for the... |
|
Experimental |
| 29 |
Exthalpy/GenLang
Self-Decoding Compression Architecture |
|
Experimental |
| 30 |
louisbrulenaudet/mergekit-assistant
Mergekit Assistant is a cutting-edge toolkit designed for the seamless... |
|
Experimental |