flash-linear-attention and Flash-Sparse-Attention

Linear attention and sparse attention are complementary techniques for reducing transformer computational complexity—linear attention approximates full attention in O(n) time via state-space models, while sparse attention maintains exact attention but only between selected token pairs—so these implementations target different efficiency trade-offs and could be used for different use cases rather than as direct alternatives.

Maintenance 23/25
Adoption 21/25
Maturity 25/25
Community 20/25
Maintenance 2/25
Adoption 10/25
Maturity 15/25
Community 9/25
Stars: 4,549
Forks: 431
Downloads: 438,484
Commits (30d): 30
Language: Python
License: MIT
Stars: 983
Forks: 14
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
Stale 6m No Package No Dependents

About flash-linear-attention

fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Provides PyTorch and Triton kernels for linear attention variants (RetNet, GLA, Mamba, RWKV, DeltaNet, and 20+ emerging architectures), optimized for CPU and GPU across NVIDIA, AMD, and Intel platforms. Includes fused operators, hybrid model support, and variable-length sequence handling to reduce memory overhead during training. Integrates with Hugging Face model hub and the companion `flame` training framework for distributed model development.

About Flash-Sparse-Attention

Relaxed-System-Lab/Flash-Sparse-Attention

🚀🚀 Efficient implementations of Native Sparse Attention

Scores updated daily from GitHub, PyPI, and npm data. How scores work