LLM Quantization Techniques LLM Tools
Tools and libraries for compressing LLM weights through quantization methods (int8, int4, binary, ternary), including inference frameworks and optimization techniques. Does NOT include general model compression, pruning, distillation, or non-quantization-based optimization approaches.
There are 21 llm quantization techniques tools tracked. 1 score above 50 (established tier). The highest-rated is huawei-csl/SINQ at 60/100 with 602 stars and 251 monthly downloads.
Get all 21 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-quantization-techniques&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
huawei-csl/SINQ
Welcome to the official repository of SINQ! A novel, fast and high-quality... |
|
Established |
| 2 |
SILX-LABS/QUASAR-SUBNET
QUASAR is a long-context foundation model and decentralized evaluation... |
|
Emerging |
| 3 |
m96-chan/0xBitNet
Run BitNet b1.58 ternary LLMs with WebGPU — in browsers and native apps |
|
Emerging |
| 4 |
stackblogger/bitnet.js
BitNet.Js - A node.js implementation of the microsoft bitnet.cpp inference framework. |
|
Emerging |
| 5 |
AnswerDotAI/cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for... |
|
Emerging |
| 6 |
FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of... |
|
Emerging |
| 7 |
grctest/Electron-BitNet
Running Microsoft's BitNet via Electron, React & Astro |
|
Experimental |
| 8 |
OpenGnosis/bintensors
Binary tensor format for more efficient storage format for multi-dimensional... |
|
Experimental |
| 9 |
GURPREETKAURJETHRA/LLaMA3-Quantization
LLaMA3-Quantization |
|
Experimental |
| 10 |
upunaprosk/quantized-lm-confidence
Code for NAACL paper When Quantization Affects Confidence of Large Language Models? |
|
Experimental |
| 11 |
tomsanbear/bitnet-rs
Implementing the BitNet model in Rust |
|
Experimental |
| 12 |
LessUp/llm-speed
CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core... |
|
Experimental |
| 13 |
cnygaard/glq
E8 lattice codebook quantization for LLM weights — 2/3/4 bpw with fused... |
|
Experimental |
| 14 |
dnotitia/smoothie-qwen
A lightweight adjustment tool for smoothing token probabilities in the Qwen... |
|
Experimental |
| 15 |
kevin-pek/bitnet.c
Zero-dependency implementation of BitNet neural network training and BPE... |
|
Experimental |
| 16 |
GURPREETKAURJETHRA/Quantize-LLM-using-AWQ
Quantize LLM using AWQ |
|
Experimental |
| 17 |
Artessay/ArtQuantization
ArtQuantization is developed for quantizing Large Language Models, focusing... |
|
Experimental |
| 18 |
amajji/LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF
LLM quantization techniques: absmax, zero-point, GPTQ and GGUF |
|
Experimental |
| 19 |
elphinkuo/llamaqt.c
Clean C language version of quantizing llama2 model and running quantized... |
|
Experimental |
| 20 |
SRafi007/Quantization-for-LLMs-An-Intuitive-Introduction
A beginner-friendly note explaining why and how quantization is used in... |
|
Experimental |
| 21 |
akhilchibber/Llama2-Quantization
Quantization of the Llama 2 model |
|
Experimental |