audio-diffusion-pytorch and modular-diffusion
The first is a specialized audio generation framework, while the second is a general-purpose diffusion model toolkit—they are complements, as modular-diffusion could be used to build custom architectures that audio-diffusion-pytorch implements for its specific domain.
About audio-diffusion-pytorch
archinetai/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
Supports unconditional and text-conditional generation with T5 embeddings, diffusion-based upsampling/vocoding, and autoencoding with learnable latents. Built on dimension-agnostic U-Net and diffusion primitives via the `a-unet` library, with configurable noise schedules (V-diffusion) and sampling strategies. Integrates with Hugging Face transformers for text conditioning and supports custom encoders for latent compression.
About modular-diffusion
cabralpinto/modular-diffusion
Python library for designing and training your own Diffusion Models with PyTorch
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work