guxm2021/ALT_SpeechBrain

[ISMIR 2022] Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription

/ 100

Emerging

Adapts wav2vec 2.0 self-supervised representations to singing through transfer learning, exploring optimal layer-wise fine-tuning strategies to bridge the ASR-to-ALT domain gap. Extends the standard CTC decoder with a hybrid CTC/attention architecture to improve character-level transcription accuracy on singing audio. Built on the SpeechBrain toolkit with PyTorch, supporting training on multiple benchmark datasets (DSing, DALI, Hansen, Jamendo, Mauch) including source-separated vocal tracks via Demucs.

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

MattyB95/Jabberjay

🦜 Synthetic Voice Detection

hammaad2002/ASRAdversarialAttacks

An ASR (Automatic Speech Recognition) adversarial attack repository.

balaragavesh/w2vindia

w2vindia is a self-supervised Wav2Vec 2.0 Base model pre-trained from scratch on multilingual...

emilykhidirova/speech-emotion-recognition

Speech emotion recognition using fine-tuned Wav2Vec2

henilp105/TeluguASR

Telugu ASR model trained on IIIT Hyderabad ASR Challenge dataset and OpenSLR66 dataset

Explore Transformer Models

All categories Trending Transformer directory Insights