microsoft/SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

46
/ 100
Emerging

Implements an encoder-decoder transformer architecture with unified masked prediction across speech and text modalities, leveraging unpaired data to improve representation learning. Supports diverse downstream tasks including ASR, speech synthesis, speech translation, and multilingual processing through variants like Speech2C, SpeechLM, and VioLA. Integrates with HuggingFace and provides pre-trained checkpoints scaled from base to large models trained on LibriSpeech and Libri-Light corpora.

1,435 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

1,435

Forks

135

Language

Python

License

MIT

Last pushed

Apr 24, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/microsoft/SpeechT5"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.