microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Implements an encoder-decoder transformer architecture with unified masked prediction across speech and text modalities, leveraging unpaired data to improve representation learning. Supports diverse downstream tasks including ASR, speech synthesis, speech translation, and multilingual processing through variants like Speech2C, SpeechLM, and VioLA. Integrates with HuggingFace and provides pre-trained checkpoints scaled from base to large models trained on LibriSpeech and Libri-Light corpora.
1,435 stars. No commits in the last 6 months.
Stars
1,435
Forks
135
Language
Python
License
MIT
Category
Last pushed
Apr 24, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/microsoft/SpeechT5"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Spr-Aachen/Easy-Voice-Toolkit
A user-friendly audio toolkit for voice recognition, voice transcription, voice conversion etc.
ftyers/commonvoice-utils
Linguistic processing for Common Voice
alphacep/awesome-russian-speech
Russian speech technology links
microsoft/UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
PrzemyslawSwiderski/python-gradle-plugin
Gradle plugin to run Python projects.