mailong25/self-supervised-speech-recognition

speech to text with self-supervised learning based on wav2vec 2.0 framework

/ 100

Emerging

Builds accurate speech recognition for low-resource languages through a three-stage pipeline: self-supervised pretraining on unlabeled audio, fine-tuning on minimal labeled data (as little as 1 hour), and n-gram language model integration for beam search decoding. Leverages fairseq's wav2vec 2.0 implementation with cross-lingual transfer initialization and optional KenLM decoding, enabling practical deployment via a simple Python API despite training resource requirements (V100 GPUs).

379 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 24 / 25

How are scores calculated?

Stars

379

Forks

116

Language

Python

License

—

Compare

self-supervised-speech-recognition and wav2vec2-huggingface-demo

Higher-rated alternatives

liangstein/Chinese-speech-to-text

Chinese Speech To Text Using Wavenet

louiskirsch/speechT

An opensource speech-to-text software written in tensorflow

Open-Speech-EkStep/vakyansh-models

Open source speech to text models for Indic Languages

Open-Speech-EkStep/vakyansh-wav2vec2-experimentation

Repository containing experimentation platform on how to train, infer on wav2vec2 models.

oliverguhr/wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Explore Voice AI Tools

All categories Trending Voice AI directory Insights