SageAttention and SpargeAttn
These are **competitors** — both optimize attention computation for inference speedup, but SageAttention uses quantization to achieve 2-5x gains while SpargeAttention uses sparsity patterns, making them alternative approaches to the same problem of reducing attention's computational bottleneck.
Maintenance
10/25
Adoption
10/25
Maturity
16/25
Community
21/25
Maintenance
10/25
Adoption
10/25
Maturity
16/25
Community
18/25
Stars: 3,213
Forks: 366
Downloads: —
Commits (30d): 0
Language: Cuda
License: Apache-2.0
Stars: 956
Forks: 87
Downloads: —
Commits (30d): 0
Language: Cuda
License: Apache-2.0
No Package
No Dependents
No Package
No Dependents
About SageAttention
thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
About SpargeAttn
thu-ml/SpargeAttn
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work