YuanGongND/whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

/ 100

Established

Combines Whisper's speech recognition with audio event tagging via a lightweight Time- and Layer-wise Transformer adapter trained on frozen Whisper encoder representations, enabling simultaneous multi-class AudioSet labeling (527 classes) at configurable temporal resolution with <1% computational overhead. Maintains API compatibility with OpenAI Whisper while supporting configurable time windows (multiples of 0.4s) for segment-level predictions, available as a PyPI package with HuggingFace Space and Colab demos for immediate use.

412 stars and 1,194 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 17 / 25

Maturity 25 / 25

Community 15 / 25

How are scores calculated?

Stars

412

Forks

Language

Python

License

BSD-2-Clause

Related tools

huggingface/distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

adi-gov-tw/Taiwan-Tongues-ASR-CE

Taiwan Tongues ASR CE 是一個開源語音辨識（Automatic Speech Recognition, ASR）模型專案，專為台灣多元語言環境設計。本模型支援...

KevKibe/African-Whisper

🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.

sandy1990418/ChineseTaiwaneseWhisper

This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese...

ga642381/Taiwanese-Whisper

fine-tune Whipser model for Taiwanese speech recognition

Explore Voice AI Tools

All categories Trending Voice AI directory Insights