intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantization, MXFP4, NVFP4, GGUF, and adaptive schemes.
883 stars and 44,854 monthly downloads. Actively maintained with 84 commits in the last 30 days. Available on PyPI.
Stars
883
Forks
81
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Monthly downloads
44,854
Commits (30d)
84
Dependencies
8
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/intel/auto-round"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
BlinkDL/RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly...
Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
NVIDIA/kvpress
LLM KV cache compression made easy