flash-linear-attention and SageAttention

These are competitors in the sparse/efficient attention space: both optimize attention computation speed (linear attention vs. quantized attention), but use different techniques and target similar use cases, so practitioners typically choose one approach or the other rather than combining them.

flash-linear-attention

Verified

SageAttention

Established

Maintenance 23/25

Adoption 21/25

Maturity 25/25

Community 20/25

Maintenance 10/25

Adoption 10/25

Maturity 16/25

Community 21/25

Stars: 4,549

Forks: 431

Downloads: 438,484

Commits (30d): 30

Language: Python

License: MIT

Stars: 3,213

Forks: 366

Downloads: —

Commits (30d): 0

Language: Cuda

License: Apache-2.0

No risk flags

No Package No Dependents

About flash-linear-attention

fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Provides PyTorch and Triton kernels for linear attention variants (RetNet, GLA, Mamba, RWKV, DeltaNet, and 20+ emerging architectures), optimized for CPU and GPU across NVIDIA, AMD, and Intel platforms. Includes fused operators, hybrid model support, and variable-length sequence handling to reduce memory overhead during training. Integrates with Hugging Face model hub and the companion `flame` training framework for distributed model development.

About SageAttention

thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Related comparisons

flash-linear-attention and flame flash-linear-attention and Star-Attention flash-linear-attention and Flash-Sparse-Attention flash-linear-attention and flash_attention_inference flash-linear-attention and SpargeAttn

Scores updated daily from GitHub, PyPI, and npm data. How scores work