lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

48
/ 100
Emerging

Combines mel-spectrogram feature extraction with a Conformer encoder and random projection quantizer to learn discrete semantic tokens from unlabeled audio. The pretraining pipeline includes masking-based self-supervision (~60% mask probability) and supports downstream applications like TTS and vocoding. Integrates with RVQ (Residual Vector Quantization) codebooks and outputs can feed into models like Spear-TTS or SoundStorm for speech synthesis.

133 stars. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 13 / 25

How are scores calculated?

Stars

133

Forks

12

Language

Python

License

MIT

Last pushed

Sep 25, 2023

Commits (30d)

0

Dependencies

7

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/lucasnewman/best-rq-pytorch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.