LLM Quantization Techniques LLM Tools

Tools and libraries for compressing LLM weights through quantization methods (int8, int4, binary, ternary), including inference frameworks and optimization techniques. Does NOT include general model compression, pruning, distillation, or non-quantization-based optimization approaches.

There are 21 llm quantization techniques tools tracked. 1 score above 50 (established tier). The highest-rated is huawei-csl/SINQ at 60/100 with 602 stars and 251 monthly downloads.

Get all 21 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-quantization-techniques&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	huawei-csl/SINQ Welcome to the official repository of SINQ! A novel, fast and high-quality...	60	Established	602	Python
2	SILX-LABS/QUASAR-SUBNET QUASAR is a long-context foundation model and decentralized evaluation...	41	Emerging	7	Python
3	m96-chan/0xBitNet Run BitNet b1.58 ternary LLMs with WebGPU — in browsers and native apps	36	Emerging	10	TypeScript
4	stackblogger/bitnet.js BitNet.Js - A node.js implementation of the microsoft bitnet.cpp inference framework.	34	Emerging	34	HTML
5	AnswerDotAI/cold-compress Cold Compress is a hackable, lightweight, and open-source toolkit for...	32	Emerging	148	Python
6	FMInference/H2O [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of...	31	Emerging	506	Python
7	grctest/Electron-BitNet Running Microsoft's BitNet via Electron, React & Astro	28	Experimental	56	JavaScript
8	OpenGnosis/bintensors Binary tensor format for more efficient storage format for multi-dimensional...	26	Experimental	4	Rust
9	GURPREETKAURJETHRA/LLaMA3-Quantization LLaMA3-Quantization	26	Experimental	3	Python
10	upunaprosk/quantized-lm-confidence Code for NAACL paper When Quantization Affects Confidence of Large Language Models?	24	Experimental	3	Jupyter Notebook
11	tomsanbear/bitnet-rs Implementing the BitNet model in Rust	22	Experimental	46	Rust
12	LessUp/llm-speed CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core...	22	Experimental	—	Python
13	cnygaard/glq E8 lattice codebook quantization for LLM weights — 2/3/4 bpw with fused...	22	Experimental	—	Python
14	dnotitia/smoothie-qwen A lightweight adjustment tool for smoothing token probabilities in the Qwen...	21	Experimental	104	Python
15	kevin-pek/bitnet.c Zero-dependency implementation of BitNet neural network training and BPE...	18	Experimental	5	C
16	GURPREETKAURJETHRA/Quantize-LLM-using-AWQ Quantize LLM using AWQ	16	Experimental	2	Jupyter Notebook
17	Artessay/ArtQuantization ArtQuantization is developed for quantizing Large Language Models, focusing...	16	Experimental	1	Python
18	amajji/LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF LLM quantization techniques: absmax, zero-point, GPTQ and GGUF	15	Experimental	2	Jupyter Notebook
19	elphinkuo/llamaqt.c Clean C language version of quantizing llama2 model and running quantized...	13	Experimental	5	C
20	SRafi007/Quantization-for-LLMs-An-Intuitive-Introduction A beginner-friendly note explaining why and how quantization is used in...	11	Experimental	—	Jupyter Notebook
21	akhilchibber/Llama2-Quantization Quantization of the Llama 2 model	10	Experimental	1	Jupyter Notebook