chengzeyi/stable-fast

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

50
/ 100
Established

Implements custom CUDA kernels for convolution fusion, fused GEMM operations, and optimized GroupNorm via Triton, combined with TorchScript tracing and CUDA Graph capture to eliminate CPU overhead. Supports dynamic shapes, LoRA, and ControlNet natively while compiling models in seconds rather than minutes, targeting the full HuggingFace Diffusers ecosystem including video diffusion pipelines.

1,305 stars and 167 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m No Dependents
Maintenance 0 / 25
Adoption 15 / 25
Maturity 18 / 25
Community 17 / 25

How are scores calculated?

Stars

1,305

Forks

91

Language

Python

License

MIT

Last pushed

Mar 27, 2025

Monthly downloads

167

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/chengzeyi/stable-fast"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.