kkoutini/PaSST

Efficient Training of Audio Transformers with Patchout

/ 100

Emerging

Implements patch-based dropout during spectrogram processing to reduce training time and memory by 2-3× while maintaining or improving accuracy on audio classification tasks. Uses vision transformer architecture adapted for audio with configurable structured (time-frequency) or unstructured patch dropout, integrated with PyTorch Lightning, Sacred for experiment management, and Weights & Biases for logging. Provides pre-trained checkpoints compatible with the HEAR 2021 NeurIPS API for direct inference or fine-tuning on downstream datasets like AudioSet.

370 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

370

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

descriptinc/descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz...

crlandsc/torch-log-wmse

logWMSE, an audio quality metric & loss function with support for digital silence target. Useful...

drethage/speech-denoising-wavenet

A neural network for end-to-end speech denoising

KyungsuKim42/tokensynth

The official implementation of TokenSynth (ICASSP 2025)

YuanGongND/ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Explore Voice AI Tools

All categories Trending Voice AI directory Insights