chengzeyi/stable-fast
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Implements custom CUDA kernels for convolution fusion, fused GEMM operations, and optimized GroupNorm via Triton, combined with TorchScript tracing and CUDA Graph capture to eliminate CPU overhead. Supports dynamic shapes, LoRA, and ControlNet natively while compiling models in seconds rather than minutes, targeting the full HuggingFace Diffusers ecosystem including video diffusion pipelines.
1,305 stars and 167 monthly downloads. No commits in the last 6 months. Available on PyPI.
Stars
1,305
Forks
91
Language
Python
License
MIT
Category
Last pushed
Mar 27, 2025
Monthly downloads
167
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/chengzeyi/stable-fast"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
jina-ai/discoart
🪩 Create Disco Diffusion artworks in one line
siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
wooyeolbaek/attention-map-diffusers
🚀 Cross attention map tools for huggingface/diffusers
riffusion/riffusion-hobby
Stable diffusion for real-time music generation
hkproj/pytorch-stable-diffusion
Stable Diffusion implemented from scratch in PyTorch