pytorch/ao
PyTorch native quantization and sparsity for training and inference
Provides composable quantization techniques (int4/int8 weight-only, float8 dynamic, QAT) and structured sparsity methods (2:4 semi-structured, block sparsity) with optimized kernels via MSLK, enabling training speedups up to 1.5x and inference gains up to 2.37x. Integrates seamlessly with `torch.compile()`, FSDP2, and popular fine-tuning frameworks (Unsloth, Axolotl, HF Transformers), plus inference backends like vLLM and ExecuTorch for edge deployment.
2,729 stars. Actively maintained with 132 commits in the last 30 days.
Stars
2,729
Forks
456
Language
Python
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
132
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/pytorch/ao"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
BlinkDL/RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly...
Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
NVIDIA/kvpress
LLM KV cache compression made easy