LLM Quantization Techniques LLM Tools

Tools and libraries for compressing LLM weights through quantization methods (int8, int4, binary, ternary), including inference frameworks and optimization techniques. Does NOT include general model compression, pruning, distillation, or non-quantization-based optimization approaches.

There are 21 llm quantization techniques tools tracked. 1 score above 50 (established tier). The highest-rated is huawei-csl/SINQ at 60/100 with 602 stars and 251 monthly downloads.

Get all 21 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-quantization-techniques&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 huawei-csl/SINQ

Welcome to the official repository of SINQ! A novel, fast and high-quality...

60
Established
2 SILX-LABS/QUASAR-SUBNET

QUASAR is a long-context foundation model and decentralized evaluation...

41
Emerging
3 m96-chan/0xBitNet

Run BitNet b1.58 ternary LLMs with WebGPU — in browsers and native apps

36
Emerging
4 stackblogger/bitnet.js

BitNet.Js - A node.js implementation of the microsoft bitnet.cpp inference framework.

34
Emerging
5 AnswerDotAI/cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for...

32
Emerging
6 FMInference/H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of...

31
Emerging
7 grctest/Electron-BitNet

Running Microsoft's BitNet via Electron, React & Astro

28
Experimental
8 OpenGnosis/bintensors

Binary tensor format for more efficient storage format for multi-dimensional...

26
Experimental
9 GURPREETKAURJETHRA/LLaMA3-Quantization

LLaMA3-Quantization

26
Experimental
10 upunaprosk/quantized-lm-confidence

Code for NAACL paper When Quantization Affects Confidence of Large Language Models?

24
Experimental
11 tomsanbear/bitnet-rs

Implementing the BitNet model in Rust

22
Experimental
12 LessUp/llm-speed

CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core...

22
Experimental
13 cnygaard/glq

E8 lattice codebook quantization for LLM weights — 2/3/4 bpw with fused...

22
Experimental
14 dnotitia/smoothie-qwen

A lightweight adjustment tool for smoothing token probabilities in the Qwen...

21
Experimental
15 kevin-pek/bitnet.c

Zero-dependency implementation of BitNet neural network training and BPE...

18
Experimental
16 GURPREETKAURJETHRA/Quantize-LLM-using-AWQ

Quantize LLM using AWQ

16
Experimental
17 Artessay/ArtQuantization

ArtQuantization is developed for quantizing Large Language Models, focusing...

16
Experimental
18 amajji/LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF

LLM quantization techniques: absmax, zero-point, GPTQ and GGUF

15
Experimental
19 elphinkuo/llamaqt.c

Clean C language version of quantizing llama2 model and running quantized...

13
Experimental
20 SRafi007/Quantization-for-LLMs-An-Intuitive-Introduction

A beginner-friendly note explaining why and how quantization is used in...

11
Experimental
21 akhilchibber/Llama2-Quantization

Quantization of the Llama 2 model

10
Experimental