fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

/ 100

Verified

Provides PyTorch and Triton kernels for linear attention variants (RetNet, GLA, Mamba, RWKV, DeltaNet, and 20+ emerging architectures), optimized for CPU and GPU across NVIDIA, AMD, and Intel platforms. Includes fused operators, hybrid model support, and variable-length sequence handling to reduce memory overhead during training. Integrates with Hugging Face model hub and the companion `flame` training framework for distributed model development.

4,549 stars and 438,484 monthly downloads. Used by 1 other package. Actively maintained with 30 commits in the last 30 days. Available on PyPI.

Maintenance 23 / 25

Adoption 21 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

4,549

Forks

431

Language

Python

License

MIT

Compare

flash-linear-attention and SageAttention flash-linear-attention and flame flash-linear-attention and Star-Attention flash-linear-attention and Flash-Sparse-Attention flash-linear-attention and flash_attention_inference

Related models

thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...

thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

foundation-model-stack/fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...

fla-org/flame

🔥 A minimal training framework for scaling FLA models

skylight-org/sparse-attention-hub

Advancing the frontier of efficient AI

Explore Transformer Models

All categories Trending Transformer directory Insights