windreamer/flash-attention3-wheels

Pre-built wheels that erase Flash Attention 3 installation headaches.

/ 100

Emerging

Provides pre-built wheels for Flash Attention 3 across Windows, Linux, and Arm CUDA platforms (including GH200), eliminating compilation barriers for different hardware configurations. Uses GitHub Actions to automatically rebuild wheels biweekly with support for multiple CUDA (13.0, 12.9, 12.8, 12.6) and PyTorch versions, with wheels distributed via a curated index matching your specific environment. Integrates directly with PyTorch ecosystems by offering drop-in pip installation sourced from GitHub Pages rather than requiring manual compilation from upstream.

No Package No Dependents

Maintenance 10 / 25

Adoption 8 / 25

Maturity 9 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Related tools

wesleyscholl/drex

🦀 The transformer is a brilliant hack scaled past its limits. DREX is what comes next — tiered...

aymanelrody/FlashMLA

⚡ Optimize attention mechanisms with FlashMLA, a library of advanced sparse and dense kernels...

kamalrss88/FlashMLA

🚀 Accelerate attention mechanisms with FlashMLA, featuring optimized kernels for DeepSeek...

AstrolexisAI/MnemoCUDA

Expert streaming inference engine for MoE models larger than VRAM — run 235B+ models on consumer GPUs

NAME0x0/OMNI

PERSPECTIVE v2 — A 1.05 trillion parameter sparse Mixture-of-Experts language model that runs on...

Explore LLM Tools

All categories Trending LLM Tool directory Insights