SageAttention and SpargeAttn

These are **competitors** — both optimize attention computation for inference speedup, but SageAttention uses quantization to achieve 2-5x gains while SpargeAttention uses sparsity patterns, making them alternative approaches to the same problem of reducing attention's computational bottleneck.

SageAttention
57
Established
SpargeAttn
54
Established
Maintenance 10/25
Adoption 10/25
Maturity 16/25
Community 21/25
Maintenance 10/25
Adoption 10/25
Maturity 16/25
Community 18/25
Stars: 3,213
Forks: 366
Downloads:
Commits (30d): 0
Language: Cuda
License: Apache-2.0
Stars: 956
Forks: 87
Downloads:
Commits (30d): 0
Language: Cuda
License: Apache-2.0
No Package No Dependents
No Package No Dependents

About SageAttention

thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

About SpargeAttn

thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Scores updated daily from GitHub, PyPI, and npm data. How scores work