fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

89
/ 100
Verified

Provides PyTorch and Triton kernels for linear attention variants (RetNet, GLA, Mamba, RWKV, DeltaNet, and 20+ emerging architectures), optimized for CPU and GPU across NVIDIA, AMD, and Intel platforms. Includes fused operators, hybrid model support, and variable-length sequence handling to reduce memory overhead during training. Integrates with Hugging Face model hub and the companion `flame` training framework for distributed model development.

4,549 stars and 438,484 monthly downloads. Used by 1 other package. Actively maintained with 30 commits in the last 30 days. Available on PyPI.

Maintenance 23 / 25
Adoption 21 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

4,549

Forks

431

Language

Python

License

MIT

Last pushed

Mar 12, 2026

Monthly downloads

438,484

Commits (30d)

30

Dependencies

2

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/fla-org/flash-linear-attention"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.