mailong25/self-supervised-speech-recognition
speech to text with self-supervised learning based on wav2vec 2.0 framework
Builds accurate speech recognition for low-resource languages through a three-stage pipeline: self-supervised pretraining on unlabeled audio, fine-tuning on minimal labeled data (as little as 1 hour), and n-gram language model integration for beam search decoding. Leverages fairseq's wav2vec 2.0 implementation with cross-lingual transfer initialization and optional KenLM decoding, enabling practical deployment via a simple Python API despite training resource requirements (V100 GPUs).
379 stars. No commits in the last 6 months.
Stars
379
Forks
116
Language
Python
License
—
Category
Last pushed
Nov 22, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/mailong25/self-supervised-speech-recognition"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
liangstein/Chinese-speech-to-text
Chinese Speech To Text Using Wavenet
louiskirsch/speechT
An opensource speech-to-text software written in tensorflow
Open-Speech-EkStep/vakyansh-models
Open source speech to text models for Indic Languages
Open-Speech-EkStep/vakyansh-wav2vec2-experimentation
Repository containing experimentation platform on how to train, infer on wav2vec2 models.
oliverguhr/wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.