Text-to-Audio/Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

47
/ 100
Emerging

Implements a latent diffusion architecture combining a learned VAE for audio compression with CLAP text embeddings for conditioning, enabling efficient high-fidelity audio synthesis from text prompts and supporting cross-modal tasks like audio-to-audio editing. The model leverages BigVGAN vocoding for waveform reconstruction and includes evaluation metrics (FAD, IS, CLAP scores) for benchmarking generation quality against datasets like AudioCaps.

669 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

669

Forks

92

Language

Python

License

MIT

Last pushed

May 22, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Text-to-Audio/Make-An-Audio"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.