LLM Compression Optimization LLM Tools

Tools and techniques for reducing LLM size, memory footprint, and inference latency through compression, pruning, quantization, and architectural optimization. Does NOT include general model training, fine-tuning frameworks, or inference serving infrastructure.

There are 30 llm compression optimization tools tracked. 1 score above 70 (verified tier). The highest-rated is Tencent/AngelSlim at 79/100 with 536 stars and 5,117 monthly downloads. 1 of the top 10 are actively maintained.

Get all 30 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-compression-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	Tencent/AngelSlim Model compression toolkit engineered for enhanced usability,...	79	Verified	536	Python
2	nebuly-ai/optimate A collection of libraries to optimise AI model performances	45	Emerging	8,349	Python
3	kyo-takano/chinchilla A toolkit for scaling law research ⚖	38	Emerging	57	Python
4	liyucheng09/Selective_Context Compress your input to ChatGPT or other LLMs, to let them process 2x more...	38	Emerging	410	Python
5	antgroup/glake GLake: optimizing GPU memory management and IO transmission.	35	Emerging	499	Python
6	TsingmaoAI/MI-optimize mi-optimize is a versatile tool designed for the quantization and evaluation...	31	Emerging	25	Python
7	microsoft/only_train_once OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured...	29	Experimental	50	Python
8	robtacconelli/Nacrith-GPU Nacrith — Lossless text compression via ensemble neural arithmetic coding....	28	Experimental	17	Python
9	amazon-science/llm-rank-pruning LLM-Rank: A graph theoretical approach to structured pruning of large...	27	Experimental	8	Python
10	naskio/mergeui All-in-one UI for merged LLMs in Hugging Face	26	Experimental	25	Python
11	AndyyyYuuu/lm-is-compressor An accurate language model is a high-compression, lossless data compressor	25	Experimental	4	Python
12	LINs-lab/DeFT [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient...	24	Experimental	50	Jupyter Notebook
13	deadlykitten4/ERC-SVD ERC-SVD: Error-Controlled SVD for Large Language Model Compression	23	Experimental	1	Python
14	oliviersaidi/PACF_LLM Pattern-aware optimization framework achieving 93.8% complexity reduction in...	23	Experimental	1	Python
15	M9rth/heretic 🛠 Remove censorship from language models instantly using advanced...	23	Experimental	1	Python
16	friendshipkim/overfill Code for OverFill: Two-Stage Models for Efficient Language Model Decoding	19	Experimental	5	Python
17	Pro-GenAI/ShortLang Compressed Text for efficient LLMs	18	Experimental	4	Python
18	talkking/PrunerGPT [ICASSP2024] One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large...	18	Experimental	6	Python
19	Yvancg/optimizers A collection of minimal, dependency-free, performance-focused utilities for...	16	Experimental	1	JavaScript
20	louisbrulenaudet/mergeKit Tools for merging pretrained Large Language Models and create Mixture of...	15	Experimental	8	Jupyter Notebook
21	Mikola78/trinity-large-tech-report 🚀 Explore advanced sparse Mixture-of-Experts models with up to 400B...	14	Experimental	—	—
22	simocolo/nnDrain A PyTorch implementation for structural pruning applied to neural networks...	13	Experimental	5	Jupyter Notebook
23	plandes/lmtask Inferencing and Training Large Language Model Tasks	12	Experimental	1	Python
24	burcgokden/LLM-from-Power-Law-Decoder-Representations Implementation of PLDR-LLM: Large Language Model from Power Law Decoder...	11	Experimental	2	Python
25	0xnu/multicollinearity_llm A multicollinearity-based compression C program, identifies and removes...	11	Experimental	2	C
26	chandan11248/deepseek-innovations-from-scratch Reverse-engineering how DeepSeek achieved frontier LLM performance at a...	11	Experimental	—	Jupyter Notebook
27	arrmansa/Temporal-Neuron-Variance-Pruning-Demo An implementation of Variance Pruning: Pruning Language Models via Temporal...	10	Experimental	1	Jupyter Notebook
28	burcgokden/PLDR-LLM-with-KVG-cache Implementation of PLDR-LLM with KV-cache and G-cache in Pytorch for the...	10	Experimental	1	Python
29	Exthalpy/GenLang Self-Decoding Compression Architecture	10	Experimental	1	Jupyter Notebook
30	louisbrulenaudet/mergekit-assistant Mergekit Assistant is a cutting-edge toolkit designed for the seamless...	10	Experimental	1	—