flash-linear-attention and flame

Flash-linear-attention provides the optimized model implementations that flame is purpose-built to train at scale, making them complements that work together in a stack rather than alternatives.

flash-linear-attention

Verified

flame

Established

Maintenance 23/25

Adoption 21/25

Maturity 25/25

Community 20/25

Maintenance 6/25

Adoption 10/25

Maturity 16/25

Community 20/25

Stars: 4,549

Forks: 431

Downloads: 438,484

Commits (30d): 30

Language: Python

License: MIT

Stars: 355

Forks: 58

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No risk flags

No Package No Dependents

About flash-linear-attention

fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Provides PyTorch and Triton kernels for linear attention variants (RetNet, GLA, Mamba, RWKV, DeltaNet, and 20+ emerging architectures), optimized for CPU and GPU across NVIDIA, AMD, and Intel platforms. Includes fused operators, hybrid model support, and variable-length sequence handling to reduce memory overhead during training. Integrates with Hugging Face model hub and the companion `flame` training framework for distributed model development.

About flame

fla-org/flame

🔥 A minimal training framework for scaling FLA models

Related comparisons

flash-linear-attention and SageAttention flash-linear-attention and Star-Attention flash-linear-attention and Flash-Sparse-Attention flash-linear-attention and flash_attention_inference

Scores updated daily from GitHub, PyPI, and npm data. How scores work