YuanGongND/whisper-at
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
Combines Whisper's speech recognition with audio event tagging via a lightweight Time- and Layer-wise Transformer adapter trained on frozen Whisper encoder representations, enabling simultaneous multi-class AudioSet labeling (527 classes) at configurable temporal resolution with <1% computational overhead. Maintains API compatibility with OpenAI Whisper while supporting configurable time windows (multiples of 0.4s) for segment-level predictions, available as a PyPI package with HuggingFace Space and Colab demos for immediate use.
412 stars and 1,194 monthly downloads. No commits in the last 6 months. Available on PyPI.
Stars
412
Forks
36
Language
Python
License
BSD-2-Clause
Category
Last pushed
Feb 21, 2024
Monthly downloads
1,194
Commits (30d)
0
Dependencies
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/YuanGongND/whisper-at"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
adi-gov-tw/Taiwan-Tongues-ASR-CE
Taiwan Tongues ASR CE 是一個開源語音辨識(Automatic Speech Recognition, ASR)模型專案,專為台灣多元語言環境設計。 本模型支援...
KevKibe/African-Whisper
🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
sandy1990418/ChineseTaiwaneseWhisper
This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese...
ga642381/Taiwanese-Whisper
fine-tune Whipser model for Taiwanese speech recognition