teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
Converts audio into mel spectrograms for diffusion model training, then reconstructs audio from generated spectrograms. Supports both standard DDPM and latent diffusion approaches via VAE compression, DDIM for faster inference (~50 steps), and conditional generation on text/audio embeddings. Integrates directly with Hugging Face's `diffusers` package and model hub, with pre-trained checkpoints available for music genres and Gradio interfaces for interactive use.
789 stars. No commits in the last 6 months.
Stars
789
Forks
79
Language
Jupyter Notebook
License
GPL-3.0
Category
Last pushed
Sep 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/teticio/audio-diffusion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PrunaAI/pruna
Pruna is a model optimization framework built for developers, enabling you to deliver faster,...
bytedance/LatentSync
Taming Stable Diffusion for Lip Sync!
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
sayakpaul/diffusers-torchao
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8...