Sparse Attention Optimization Transformer Models

There are 20 sparse attention optimization models tracked. 1 score above 70 (verified tier). The highest-rated is fla-org/flash-linear-attention at 89/100 with 4,549 stars and 438,484 monthly downloads. 1 of the top 10 are actively maintained.

Get all 20 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=sparse-attention-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	fla-org/flash-linear-attention 🚀 Efficient implementations of state-of-the-art linear attention models	89	Verified	4,549	Python
2	thu-ml/SageAttention [ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves...	57	Established	3,213	Cuda
3	thu-ml/SpargeAttn [ICML2025] SpargeAttention: A training-free sparse attention that...	47	Emerging	956	Cuda
4	foundation-model-stack/fms-fsdp 🚀 Efficiently (pre)training foundation models with native PyTorch features,...	45	Emerging	282	Python
5	fla-org/flame 🔥 A minimal training framework for scaling FLA models	45	Emerging	355	Python
6	skylight-org/sparse-attention-hub Advancing the frontier of efficient AI	40	Emerging	54	Python
7	egaoharu-kensei/flash-attention-triton Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with...	38	Emerging	21	Python
8	NX-AI/mlstm_kernels Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.	37	Emerging	87	Jupyter Notebook
9	zhenyi4/ssa Official repository for "SSA: Sparse Sparse Attention by Aligning Full and...	34	Emerging	10	Python
10	XunhaoLai/native-sparse-attention-triton Efficient triton implementation of Native Sparse Attention.	34	Emerging	269	Python
11	Infini-AI-Lab/vortex_torch Vortex: A Flexible and Efficient Sparse Attention Framework	34	Emerging	49	Python
12	NVIDIA/Star-Attention Efficient LLM Inference over Long Sequences	33	Emerging	392	Python
13	NimbleEdge/sparse_transformers Sparse Inferencing for transformer based LLMs	32	Emerging	216	Python
14	Relaxed-System-Lab/Flash-Sparse-Attention 🚀🚀 Efficient implementations of Native Sparse Attention	30	Emerging	983	Python
15	Bruce-Lee-LY/flash_attention_inference Performance of the C++ interface of flash attention and flash attention v2...	30	Emerging	43	C++
16	Bruce-Lee-LY/decoding_attention Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using...	28	Experimental	46	C++
17	jlamprou/Infini-Attention Efficient Infinite Context Transformers with Infini-attention Pytorch...	20	Experimental	86	Python
18	nanowell/Q-Sparse-LLM My Implementation of Q-Sparse: All Large Language Models can be Fully...	19	Experimental	34	Python
19	XunhaoLai/ring-sliding-window-attention Ring sliding window attention implementation with flash attention	16	Experimental	9	Python
20	BICLab/MetaLA Offical implementation of "MetaLA: Unified Optimal Linear Approximation to...	14	Experimental	35	Python

Comparisons in this category

flash-linear-attention and SageAttention (89 vs 57) flash-linear-attention and Flash-Sparse-Attention (89 vs 30) flash-linear-attention and Star-Attention (89 vs 33) flash-linear-attention and flame (89 vs 45) flash-linear-attention and flash_attention_inference (89 vs 30) SageAttention and SpargeAttn (57 vs 47)