chengzeyi/stable-fast

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

/ 100

Established

Implements custom CUDA kernels for convolution fusion, fused GEMM operations, and optimized GroupNorm via Triton, combined with TorchScript tracing and CUDA Graph capture to eliminate CPU overhead. Supports dynamic shapes, LoRA, and ControlNet natively while compiling models in seconds rather than minutes, targeting the full HuggingFace Diffusers ecosystem including video diffusion pipelines.

1,305 stars and 167 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 15 / 25

Maturity 18 / 25

Community 17 / 25

How are scores calculated?

Stars

1,305

Forks

Language

Python

License

MIT

Category

diffusion-deployment-serving

Last pushed

Mar 27, 2025

Monthly downloads

167

Commits (30d)

GitHub PyPI

Diffusion Deployment Serving · 106 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/chengzeyi/stable-fast"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related models

jina-ai/discoart

🪩 Create Disco Diffusion artworks in one line

siliconflow/onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

wooyeolbaek/attention-map-diffusers

🚀 Cross attention map tools for huggingface/diffusers

riffusion/riffusion-hobby

Stable diffusion for real-time music generation

hkproj/pytorch-stable-diffusion

Stable Diffusion implemented from scratch in PyTorch

Explore Diffusion Models

All categories Trending Diffusion directory Insights