teticio/audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.

/ 100

Emerging

Converts audio into mel spectrograms for diffusion model training, then reconstructs audio from generated spectrograms. Supports both standard DDPM and latent diffusion approaches via VAE compression, DDIM for faster inference (~50 steps), and conditional generation on text/audio embeddings. Integrates directly with Hugging Face's `diffusers` package and model hub, with pre-trained checkpoints available for music genres and Gradio interfaces for interactive use.

789 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

789

Forks

Language

Jupyter Notebook

License

GPL-3.0

Higher-rated alternatives

PrunaAI/pruna

Pruna is a model optimization framework built for developers, enabling you to deliver faster,...

bytedance/LatentSync

Taming Stable Diffusion for Lip Sync!

haoheliu/AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.

Text-to-Audio/Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

sayakpaul/diffusers-torchao

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8...

Explore Diffusion Models

All categories Trending Diffusion directory Insights