Sparse Attention Optimization Transformer Models
There are 20 sparse attention optimization models tracked. 1 score above 70 (verified tier). The highest-rated is fla-org/flash-linear-attention at 89/100 with 4,549 stars and 438,484 monthly downloads. 1 of the top 10 are actively maintained.
Get all 20 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=sparse-attention-optimization&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
fla-org/flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models |
|
Verified |
| 2 |
thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves... |
|
Established |
| 3 |
thu-ml/SpargeAttn
[ICML2025] SpargeAttention: A training-free sparse attention that... |
|
Emerging |
| 4 |
foundation-model-stack/fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features,... |
|
Emerging |
| 5 |
fla-org/flame
🔥 A minimal training framework for scaling FLA models |
|
Emerging |
| 6 |
skylight-org/sparse-attention-hub
Advancing the frontier of efficient AI |
|
Emerging |
| 7 |
egaoharu-kensei/flash-attention-triton
Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with... |
|
Emerging |
| 8 |
NX-AI/mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels. |
|
Emerging |
| 9 |
zhenyi4/ssa
Official repository for "SSA: Sparse Sparse Attention by Aligning Full and... |
|
Emerging |
| 10 |
XunhaoLai/native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention. |
|
Emerging |
| 11 |
Infini-AI-Lab/vortex_torch
Vortex: A Flexible and Efficient Sparse Attention Framework |
|
Emerging |
| 12 |
NVIDIA/Star-Attention
Efficient LLM Inference over Long Sequences |
|
Emerging |
| 13 |
NimbleEdge/sparse_transformers
Sparse Inferencing for transformer based LLMs |
|
Emerging |
| 14 |
Relaxed-System-Lab/Flash-Sparse-Attention
🚀🚀 Efficient implementations of Native Sparse Attention |
|
Emerging |
| 15 |
Bruce-Lee-LY/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2... |
|
Emerging |
| 16 |
Bruce-Lee-LY/decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using... |
|
Experimental |
| 17 |
jlamprou/Infini-Attention
Efficient Infinite Context Transformers with Infini-attention Pytorch... |
|
Experimental |
| 18 |
nanowell/Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully... |
|
Experimental |
| 19 |
XunhaoLai/ring-sliding-window-attention
Ring sliding window attention implementation with flash attention |
|
Experimental |
| 20 |
BICLab/MetaLA
Offical implementation of "MetaLA: Unified Optimal Linear Approximation to... |
|
Experimental |