Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
Implements a latent diffusion architecture combining a learned VAE for audio compression with CLAP text embeddings for conditioning, enabling efficient high-fidelity audio synthesis from text prompts and supporting cross-modal tasks like audio-to-audio editing. The model leverages BigVGAN vocoding for waveform reconstruction and includes evaluation metrics (FAD, IS, CLAP scores) for benchmarking generation quality against datasets like AudioCaps.
669 stars. No commits in the last 6 months.
Stars
669
Forks
92
Language
Python
License
MIT
Category
Last pushed
May 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Text-to-Audio/Make-An-Audio"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PrunaAI/pruna
Pruna is a model optimization framework built for developers, enabling you to deliver faster,...
bytedance/LatentSync
Taming Stable Diffusion for Lip Sync!
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
sayakpaul/diffusers-torchao
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8...
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...