Sparse Attention Optimization LLM Tools

Efficient sparse attention implementations and frameworks that reduce computational complexity for LLM inference and training. Includes kernel optimizations, attention pattern strategies, and performance-tuned libraries. Does NOT include general attention mechanisms, model architectures, or non-attention-specific optimization techniques.

There are 8 sparse attention optimization tools tracked. The highest-rated is windreamer/flash-attention3-wheels at 30/100 with 65 stars.

Get all 8 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=sparse-attention-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	windreamer/flash-attention3-wheels Pre-built wheels that erase Flash Attention 3 installation headaches.	30	Emerging	65	Python
2	wesleyscholl/drex 🦀 The transformer is a brilliant hack scaled past its limits. DREX is what...	23	Experimental	1	Python
3	aymanelrody/FlashMLA ⚡ Optimize attention mechanisms with FlashMLA, a library of advanced sparse...	22	Experimental	—	C++
4	kamalrss88/FlashMLA 🚀 Accelerate attention mechanisms with FlashMLA, featuring optimized kernels...	22	Experimental	—	C++
5	AstrolexisAI/MnemoCUDA Expert streaming inference engine for MoE models larger than VRAM — run...	22	Experimental	—	C
6	NAME0x0/OMNI PERSPECTIVE v2 — A 1.05 trillion parameter sparse Mixture-of-Experts...	20	Experimental	1	Rust
7	HassanJbara/lin-attn-zoo Pure PyTorch implementations of popular linear attention models	14	Experimental	—	Python
8	etasnadi/VulkanCooperativeMatrixAttention Vulkan & GLSL implementation of FlashAttention-2	14	Experimental	13	C++