Sparse Attention Optimization Transformer Models

There are 20 sparse attention optimization models tracked. 1 score above 70 (verified tier). The highest-rated is fla-org/flash-linear-attention at 89/100 with 4,549 stars and 438,484 monthly downloads. 1 of the top 10 are actively maintained.

Get all 20 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=sparse-attention-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

89
Verified
2 thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves...

57
Established
3 thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that...

47
Emerging
4 foundation-model-stack/fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features,...

45
Emerging
5 fla-org/flame

🔥 A minimal training framework for scaling FLA models

45
Emerging
6 skylight-org/sparse-attention-hub

Advancing the frontier of efficient AI

40
Emerging
7 egaoharu-kensei/flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with...

38
Emerging
8 NX-AI/mlstm_kernels

Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.

37
Emerging
9 zhenyi4/ssa

Official repository for "SSA: Sparse Sparse Attention by Aligning Full and...

34
Emerging
10 XunhaoLai/native-sparse-attention-triton

Efficient triton implementation of Native Sparse Attention.

34
Emerging
11 Infini-AI-Lab/vortex_torch

Vortex: A Flexible and Efficient Sparse Attention Framework

34
Emerging
12 NVIDIA/Star-Attention

Efficient LLM Inference over Long Sequences

33
Emerging
13 NimbleEdge/sparse_transformers

Sparse Inferencing for transformer based LLMs

32
Emerging
14 Relaxed-System-Lab/Flash-Sparse-Attention

🚀🚀 Efficient implementations of Native Sparse Attention

30
Emerging
15 Bruce-Lee-LY/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2...

30
Emerging
16 Bruce-Lee-LY/decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using...

28
Experimental
17 jlamprou/Infini-Attention

Efficient Infinite Context Transformers with Infini-attention Pytorch...

20
Experimental
18 nanowell/Q-Sparse-LLM

My Implementation of Q-Sparse: All Large Language Models can be Fully...

19
Experimental
19 XunhaoLai/ring-sliding-window-attention

Ring sliding window attention implementation with flash attention

16
Experimental
20 BICLab/MetaLA

Offical implementation of "MetaLA: Unified Optimal Linear Approximation to...

14
Experimental