windreamer/flash-attention3-wheels
Pre-built wheels that erase Flash Attention 3 installation headaches.
Provides pre-built wheels for Flash Attention 3 across Windows, Linux, and Arm CUDA platforms (including GH200), eliminating compilation barriers for different hardware configurations. Uses GitHub Actions to automatically rebuild wheels biweekly with support for multiple CUDA (13.0, 12.9, 12.8, 12.6) and PyTorch versions, with wheels distributed via a curated index matching your specific environment. Integrates directly with PyTorch ecosystems by offering drop-in pip installation sourced from GitHub Pages rather than requiring manual compilation from upstream.
Stars
65
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 04, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/windreamer/flash-attention3-wheels"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
wesleyscholl/drex
🦀 The transformer is a brilliant hack scaled past its limits. DREX is what comes next — tiered...
aymanelrody/FlashMLA
âš¡ Optimize attention mechanisms with FlashMLA, a library of advanced sparse and dense kernels...
kamalrss88/FlashMLA
🚀 Accelerate attention mechanisms with FlashMLA, featuring optimized kernels for DeepSeek...
AstrolexisAI/MnemoCUDA
Expert streaming inference engine for MoE models larger than VRAM — run 235B+ models on consumer GPUs
NAME0x0/OMNI
PERSPECTIVE v2 — A 1.05 trillion parameter sparse Mixture-of-Experts language model that runs on...