GPTQModel and AutoGPTQ

GPTQModel is a maintained fork and extension of AutoGPTQ that adds hardware acceleration support across multiple backends (CUDA, ROCm, XPU, CPU), while AutoGPTQ represents the original reference implementation of the GPTQ quantization algorithm.

GPTQModel
79
Verified
AutoGPTQ
46
Emerging
Maintenance 25/25
Adoption 13/25
Maturity 18/25
Community 23/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 20/25
Stars: 1,044
Forks: 166
Downloads:
Commits (30d): 212
Language: Python
License:
Stars: 5,033
Forks: 532
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No risk flags
Archived Stale 6m No Package No Dependents

About GPTQModel

ModelCloud/GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

About AutoGPTQ

AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

**Technical Summary:** Supports weight-only quantization to 4-bit or 3-bit precision with configurable group sizes and activation ordering, integrated with Hugging Face Transformers, Optimum, and PEFT for seamless model loading and fine-tuning. Provides optimized CUDA kernels (with Marlin int4×fp16 matrix multiplication support) and Triton backends for accelerated inference, achieving 25-91 tokens/sec speedups on A100 GPUs while maintaining model accuracy. Targets NVIDIA, AMD ROCm, and Intel Gaudi 2 platforms with pre-built wheels for CUDA 11.8/12.1.

Scores updated daily from GitHub, PyPI, and npm data. How scores work