GPTQModel and AutoGPTQ

GPTQModel is a maintained fork and extension of AutoGPTQ that adds hardware acceleration support across multiple backends (CUDA, ROCm, XPU, CPU), while AutoGPTQ represents the original reference implementation of the GPTQ quantization algorithm.

GPTQModel

Verified

AutoGPTQ

Emerging

Maintenance 25/25

Adoption 13/25

Maturity 18/25

Community 23/25

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 20/25

Stars: 1,044

Forks: 166

Downloads: —

Commits (30d): 212

Language: Python

License: —

Stars: 5,033

Forks: 532

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No risk flags

Archived Stale 6m No Package No Dependents

About GPTQModel

ModelCloud/GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

About AutoGPTQ

AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

**Technical Summary:** Supports weight-only quantization to 4-bit or 3-bit precision with configurable group sizes and activation ordering, integrated with Hugging Face Transformers, Optimum, and PEFT for seamless model loading and fine-tuning. Provides optimized CUDA kernels (with Marlin int4×fp16 matrix multiplication support) and Triton backends for accelerated inference, achieving 25-91 tokens/sec speedups on A100 GPUs while maintaining model accuracy. Targets NVIDIA, AMD ROCm, and Intel Gaudi 2 platforms with pre-built wheels for CUDA 11.8/12.1.

Scores updated daily from GitHub, PyPI, and npm data. How scores work