bitsandbytes-foundation/bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

/ 100

Verified

Implements vector-wise and block-wise quantization strategies with specialized handling for outliers, enabling 8-bit inference without performance loss and 4-bit training via low-rank adaptation (LoRA). Provides drop-in `Linear8bitLt` and `Linear4bit` modules alongside 8-bit optimizers, integrating directly with Hugging Face Transformers, Diffusers, and PEFT. Supports NVIDIA/AMD/Intel GPUs, CPUs with AVX2+, and Apple Silicon across Linux, Windows, and macOS.

8,033 stars and 6,225,728 monthly downloads. Used by 73 other packages. Actively maintained with 17 commits in the last 30 days. Available on PyPI.

Maintenance 20 / 25

Adoption 25 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

8,033

Forks

831

Language

Python

License

MIT

Category

llm-quantization-techniques

Last pushed

Mar 10, 2026

Monthly downloads

6,225,728

Commits (30d)

Dependencies

Reverse dependents

GitHub PyPI

Llm Quantization Techniques · 13 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bitsandbytes-foundation/bitsandbytes"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related models

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model...

dropbox/hqq

Official implementation of Half-Quadratic Quantization (HQQ)

OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

VITA-Group/Q-GaLore

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Hsu1023/DuQuant

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger...

Explore Transformer Models

All categories Trending Transformer directory Insights