fla-org/flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models
Provides PyTorch and Triton kernels for linear attention variants (RetNet, GLA, Mamba, RWKV, DeltaNet, and 20+ emerging architectures), optimized for CPU and GPU across NVIDIA, AMD, and Intel platforms. Includes fused operators, hybrid model support, and variable-length sequence handling to reduce memory overhead during training. Integrates with Hugging Face model hub and the companion `flame` training framework for distributed model development.
4,549 stars and 438,484 monthly downloads. Used by 1 other package. Actively maintained with 30 commits in the last 30 days. Available on PyPI.
Stars
4,549
Forks
431
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Monthly downloads
438,484
Commits (30d)
30
Dependencies
2
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/fla-org/flash-linear-attention"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...
thu-ml/SpargeAttn
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
foundation-model-stack/fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...
fla-org/flame
🔥 A minimal training framework for scaling FLA models
skylight-org/sparse-attention-hub
Advancing the frontier of efficient AI