Sparse Attention Optimization LLM Tools
Efficient sparse attention implementations and frameworks that reduce computational complexity for LLM inference and training. Includes kernel optimizations, attention pattern strategies, and performance-tuned libraries. Does NOT include general attention mechanisms, model architectures, or non-attention-specific optimization techniques.
There are 8 sparse attention optimization tools tracked. The highest-rated is windreamer/flash-attention3-wheels at 30/100 with 65 stars.
Get all 8 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=sparse-attention-optimization&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
windreamer/flash-attention3-wheels
Pre-built wheels that erase Flash Attention 3 installation headaches. |
|
Emerging |
| 2 |
wesleyscholl/drex
🦀 The transformer is a brilliant hack scaled past its limits. DREX is what... |
|
Experimental |
| 3 |
aymanelrody/FlashMLA
âš¡ Optimize attention mechanisms with FlashMLA, a library of advanced sparse... |
|
Experimental |
| 4 |
kamalrss88/FlashMLA
🚀 Accelerate attention mechanisms with FlashMLA, featuring optimized kernels... |
|
Experimental |
| 5 |
AstrolexisAI/MnemoCUDA
Expert streaming inference engine for MoE models larger than VRAM — run... |
|
Experimental |
| 6 |
NAME0x0/OMNI
PERSPECTIVE v2 — A 1.05 trillion parameter sparse Mixture-of-Experts... |
|
Experimental |
| 7 |
HassanJbara/lin-attn-zoo
Pure PyTorch implementations of popular linear attention models |
|
Experimental |
| 8 |
etasnadi/VulkanCooperativeMatrixAttention
Vulkan & GLSL implementation of FlashAttention-2 |
|
Experimental |